***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb --------------------------------------------------............. [NO] DeepSpeed C++/CUDA extension op report....... --------------------------------------------------[OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja .................. [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... ....... [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils .................. [YES] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]async_io fused_lamb ............. [NO] ....... [OKAY] ............... [NO] quantizer....... ..............[NO] [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ...............torch version .................... 1.8.1 torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... 11.1 torch versionnvcc version ......................................... 1.8.111.2 deepspeed install pathtorch cuda version .......................... 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] nvcc versiondeepspeed info ........................................ 11.2 0.4.2+bc17042, bc17042, big-sciencedeepspeed install path deepspeed wheel compiled w............ ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch install path .................... ...............1.8.1 torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version .....................torch version 11.2.................... deepspeed install path1.8.1 ........... torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']............... deepspeed info11.1 ...................nvcc version 0.4.2+bc17042, bc17042, big-science..................... deepspeed wheel compiled w.11.2 ......deepspeed install path torch 1.8, cuda 11.1........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... utils[NO] ......................... [YES][NO] ...... [OKAY] quantizer .............. [NO] .......transformer_inference [OKAY].. [NO] .......-------------------------------------------------- [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- DeepSpeed general environment info: op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_lamb ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 stochastic_transformer . [NO] ....... [OKAY] torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path ...............torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version ...............deepspeed install path ...........11.1 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1DeepSpeed general environment info: torch cuda versiontorch cuda version .............................. 11.111.1 torch install pathnvcc versionnvcc version ......................................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch version deepspeed info .................... ................... ................... 1.8.1 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.torch cuda versiondeepspeed wheel compiled w. ........................... torch 1.8, cuda 11.111.1torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attnninja ............ [NO].................. .......[OKAY] [OKAY] -------------------------------------------------- op nametransformer ............................ installed[NO] ......... compatible [OKAY] -------------------------------------------------- stochastic_transformer . [NO] cpu_adam....... ...............[OKAY] [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] --------------------------------------------------....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. *****************************************  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] transformer_inference....... ..[NO] [NO] ....... [OKAY] utils ..................transformer_inference [YES].. ......[NO] [OKAY]....... [OKAY] quantizer .............. [NO] .......utils [OKAY].................. [YES] ...... --------------------------------------------------[OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... 1.8.1 torch cuda version ...............torch install path 11.1............... nvcc version ..................... 11.2 deepspeed install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ........... torch version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ....................deepspeed info 1.8.1................... 0.4.2+bc17042, bc17042, big-science torch cuda version deepspeed wheel compiled w................ ......11.1 torch 1.8, cuda 11.1nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... ..................[OKAY] .................. ninjaninjaninja ninja.................................... .................. [OKAY][OKAY] .................. [OKAY][OKAY] -------------------------------------------------- [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- op name op name-------------------------------------------------- op name op nameop nameop name................ ................................................installed installedinstalled..installed ....compatible.. compatiblecompatiblecompatible-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ................................ ................op nameinstalledinstalled ..installed.. ................ compatible..compatible installed--------------------------------------------------compatible-------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam [YES]cpu_adam ............... ............... ..................... [YES] [YES] [YES] [OKAY]...... ...... ...... [OKAY] [OKAY] [OKAY] op nameop name op nameop name................................ ................installedinstalled................ installed..installed.. ....compatiblecompatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- ..-------------------------------------------------- compatible -------------------------------------------------- fused_adam .............fused_adam fused_adam [NO] ............. fused_adam............. ....... [NO] [NO].............[OKAY] ..............[NO] [OKAY][OKAY].......fused_lamb cpu_adamcpu_adam cpu_adamcpu_adam ............... ............... ..............................[YES] [YES] [YES] [YES]...... ..................[OKAY] cpu_adamcpu_adam ...............cpu_adam............... [YES]cpu_adam............... [YES] [YES]............ ............... ...... [OKAY][OKAY] [YES][OKAY] [OKAY]............. [OKAY][OKAY][OKAY] ...... [OKAY] fused_lambfused_lamb[NO] ................................. fused_lamb [NO][NO] [OKAY]........................... [NO][OKAY][OKAY] fused_adamfused_adam fused_adamfused_adam ............. ............. .......................... [NO] [NO] [NO] ....... [NO]....... ....... [OKAY] ....... [OKAY] [OKAY][OKAY]....... fused_lamb[OKAY] fused_lamb fused_adamfused_adam fused_adam.......................... fused_adam[NO] [NO]............. .................... ....... [NO][OKAY] [NO] sparse_attn ............sparse_attn sparse_attn [NO] ........................sparse_attn....... ............[NO][NO][OKAY] [NO].............. .......[OKAY][OKAY] transformer[OKAY] .............fused_lamb [NO]..........................fused_lamb ....... [NO][NO] ............. ....... [NO][OKAY] ....... [OKAY][OKAY]....... .......[OKAY] fused_lamb [OKAY]....... fused_lamb............. [OKAY]............. ............transformer transformer............transformer[NO] ............[NO]................... [NO].......[NO][OKAY] [OKAY] fused_lamb[NO] .......[NO]............. [OKAY][NO] .......[OKAY]....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn fused_lamb....... .................... [OKAY][OKAY] stochastic_transformer stochastic_transformer. stochastic_transformer stochastic_transformer [NO]. ........[NO] . [NO] [OKAY][NO]....... ....... .......[OKAY] [OKAY][OKAY] sparse_attn............sparse_attn transformer ............[NO]........................ .......[NO][NO] [NO] [OKAY]..................... [NO] .......sparse_attn [OKAY]............ [NO] .......sparse_attn [OKAY] [OKAY][OKAY]transformer[OKAY] sparse_attn............ transformer ............ [NO]............ [NO]sparse_attn .......[NO] ....... ............ [OKAY] ....... [OKAY][NO] ............ transformer[NO] stochastic_transformertransformer................... [OKAY][NO]............. [OKAY]transformer ....... ............transformer [OKAY][NO]............ stochastic_transformer ....... [NO] [NO] [OKAY]stochastic_transformer ....... ....... [OKAY][OKAY]. ....... [NO] [OKAY] transformer stochastic_transformer [NO] .stochastic_transformer....... [NO][OKAY] . ....... [NO][OKAY] . ....... ............[NO] stochastic_transformer [OKAY][NO] ....... .[OKAY]....... ....... [OKAY] [NO] stochastic_transformer [OKAY] ....... .[OKAY] [NO]stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op name................op name ................................installed................ ..installed installedinstalledcompatible .... ..-------------------------------------------------- compatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam...............cpu_adam[YES] [YES] ............... ............... ...... [YES][OKAY]...... [YES]......[OKAY] ...... [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... .............[OKAY]fused_adam fused_adam [NO] ............. fused_lamb ....... ............. .............[NO] [OKAY] [NO] [NO]....... fused_lamb..............[OKAY] .............[OKAY] [OKAY] [NO] fused_lamb....... .............fused_lamb [OKAY] [NO] .................... sparse_attn [NO] [OKAY] ................... [NO][OKAY] .......sparse_attn [OKAY]............ [NO] ....... sparse_attntransformer[OKAY] ........................transformer sparse_attn [NO] [NO]............................... [NO].......[OKAY] [NO] .......[OKAY] [OKAY].......stochastic_transformer transformer [OKAY]............ .stochastic_transformer [NO]transformer.[NO] ..........................[NO] [OKAY][OKAY][NO] ....... [OKAY]....... stochastic_transformer[OKAY] . stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................ ................ ................................ installed installed installed ..installed .. .. ..compatible compatible compatible compatible------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adam cpu_adam ...............cpu_adam ..............................[YES] ............... [YES] ...... [YES][YES] ...... [OKAY]......[OKAY] ...... [OKAY][OKAY] fused_adam fused_adam............. fused_adam[NO]fused_adam............. ....... .......................... [NO] [OKAY] [NO][NO] ....... .......fused_lamb[OKAY] ....... ............. [OKAY] [OKAY] [NO] fused_lamb fused_lamb....... fused_lamb ............. .............[OKAY]............. [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer sparse_attn ............sparse_attn ............ ............ [NO] [NO]............ .......[NO] .......[NO] ....... [OKAY][OKAY] ....... [OKAY] [OKAY]stochastic_transformer transformer.transformertransformer [NO]............ ............ [NO]............ ....... [NO] [NO]....... [OKAY] ....... ....... [OKAY] [OKAY] [OKAY] stochastic_transformerstochastic_transformerstochastic_transformer ... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... ..................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name--------------------------------------------------op name ................op name................op name ................installed................installed ..installedinstalled.. compatiblecompatible.. .. -------------------------------------------------- --------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... [YES]............... cpu_adamcpu_adam [YES]..................... [OKAY] ..................... [OKAY] [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. .............[NO] [NO]....... fused_adamfused_adam....... [OKAY] ............. .............[OKAY] [NO] fused_lamb [NO] fused_lamb.................... ....................[NO][OKAY] ....... [NO][OKAY][OKAY] fused_lamb ....... fused_lamb .............[OKAY] .............[NO] [NO]....... .......[OKAY] [OKAY]sparse_attn ............ [NO] ....... sparse_attn[OKAY] ............ [NO]transformer ................... sparse_attnsparse_attn[OKAY] [NO] ............transformer ............ .......[NO] ............ [OKAY][NO] ....... [NO] [OKAY] .......stochastic_transformer ....... [OKAY] transformer.[OKAY] ............[NO]transformer stochastic_transformer[NO] ....... ............ ........ [OKAY] [NO][OKAY] [NO] .............. stochastic_transformer[OKAY][OKAY] .stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja .................. .................. ninja ..................[OKAY][OKAY] [OKAY]..................---------------------------------------------------------------------------------------------------- [OKAY] -------------------------------------------------- op nameop name -------------------------------------------------- op name................ ................ op nameinstalled................installed ................installed.. .. installed compatiblecompatible.. .. --------------------------------------------------compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adamcpu_adam [YES]...............[YES] ......[YES]............... ...... [OKAY] ...... [YES] [OKAY] [OKAY] ...... [OKAY] fused_adam ............. fused_adamfused_adam[NO] fused_adam....... ............. ............. [OKAY] .............[NO][NO] .......[NO].......fused_lamb [OKAY].................... [OKAY] [NO] [OKAY].......fused_lamb [OKAY].............fused_lamb fused_lamb [NO].......................... .......[NO][NO] [OKAY].............. sparse_attn [OKAY][OKAY]............ [NO] ....... [OKAY] sparse_attntransformer ........................ [NO]sparse_attn[NO] sparse_attn ................... ................... [OKAY] [NO][NO] [OKAY] .............. stochastic_transformer [OKAY] transformer[OKAY] ............. [NO]transformer[NO] ....... transformer............ ....... [OKAY][OKAY]............ [NO] [NO]....... stochastic_transformer.......[OKAY] [OKAY]. stochastic_transformer[NO] stochastic_transformer........ [OKAY][NO] . .......[NO] [OKAY]....... [OKAY] ninjaninjaninja ninja .................................... .................. ..................[OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................ ................................installed ................installed..installed installed ..compatible compatible.... -------------------------------------------------- --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] ...............cpu_adam......cpu_adam ...............[YES][OKAY]............... ...... [YES] [YES][OKAY] ............ [OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam.............fused_adam fused_lamb ............. [NO] ............. .............[NO] ....... [NO] [NO] .......[OKAY].............. [OKAY][OKAY][OKAY] fused_lamb fused_lamb.............fused_lamb ............. [NO]............. [NO].......[NO] sparse_attn[OKAY]....... ....... ............ [OKAY] [OKAY] [NO] ....... [OKAY] transformer ............ sparse_attn[NO] ................... sparse_attn[OKAY] [NO]sparse_attn ............................... stochastic_transformer[OKAY] [NO][NO] ...............transformer [NO][OKAY]............ [OKAY]....... [NO][OKAY]transformer transformer ................... ............[OKAY][NO] [NO]....... .......[OKAY] [OKAY]stochastic_transformer .stochastic_transformerstochastic_transformer [NO] ......... [OKAY] [NO] [NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................. ..................[OKAY].................. .................. [OKAY]-------------------------------------------------- [OKAY] [OKAY] --------------------------------------------------op name -------------------------------------------------- -------------------------------------------------- op name................ op name................installed op name ................ ..installed ................installed compatible ..installed.. -------------------------------------------------- compatible..compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adam[OKAY]cpu_adam cpu_adam.............................. ...............[YES][YES] ......[YES]...... [OKAY]......[OKAY] fused_adam [OKAY]............. [NO] ....... [OKAY] fused_adamfused_adamfused_lamb .......................................fused_adam [NO][NO].............[NO] ..................... [NO][OKAY] [OKAY][OKAY] ....... fused_lamb[OKAY] fused_lamb ............. .............[NO] fused_lamb [NO] ....... sparse_attn .................... [OKAY] ............[NO] [OKAY][NO]....... .......[OKAY] [OKAY] transformer ............sparse_attn sparse_attn[NO]............ .......sparse_attn............[NO] [OKAY] .......[NO]............ [OKAY][NO]....... stochastic_transformer.......[OKAY] transformer[OKAY] transformer............. ............[NO][NO]transformer [NO].......................... [OKAY][OKAY]....... [NO] [OKAY]....... stochastic_transformer[OKAY] .stochastic_transformer [NO]stochastic_transformer ........ .[NO][OKAY] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. async_io[YES] ..................... [NO][OKAY] ....... [NO] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] async_io...... ...............[OKAY] [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ...... [YES][OKAY] ...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizer ..............utils [NO].................. .......[YES] [OKAY]...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] .................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninja ....................................ninja ..................[OKAY] [OKAY][OKAY]--------------------------------------------------.................. [OKAY]--------------------------------------------------op name -------------------------------------------------- --------------------------------------------------op name ................ op nameinstalled................op name .................. installed................ installedcompatible .. installed .. --------------------------------------------------compatible .. compatible --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adamcpu_adam[OKAY]cpu_adam ............................................. [YES][YES][YES] fused_adam...... ...... ................... [OKAY] [OKAY][NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lambfused_adam ............. [NO]............. fused_adam fused_adam....... [NO] ............. [OKAY].................... async_ioasync_io .............................. [NO][NO] .............. [NO][NO] [NO][NO][OKAY] .............. [OKAY][OKAY]fused_lamb transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] .............sparse_attn [NO]fused_lamb fused_lamb................... ............. .............[NO] [OKAY] [NO]....... ....... [OKAY] [NO] ....... [OKAY] .......[OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] transformer[OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ............ [NO] sparse_attn....... [OKAY]............ -------------------------------------------------- -------------------------------------------------- [NO]sparse_attn ....... stochastic_transformersparse_attn ............ ............[OKAY] . [NO][NO][NO]transformer ....... ................... .......[NO] ....... [OKAY][OKAY] [OKAY] [OKAY] transformertransformer ........................ stochastic_transformer [NO] [NO] ............... [OKAY][NO][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........ [NO] transformer_inference .. [NO] ....... [OKAY]async_io ............... [NO] ....... utils[NO] .................. [YES] ...... [OKAY] quantizer transformer_inference.............. ..[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizerquantizer ............................ [NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 ..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info0.4.2+bc17042, bc17042, big-science ................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninja ninja.................................... [OKAY]....................................[OKAY] --------------------------------------------------[OKAY]-------------------------------------------------- [OKAY] op name --------------------------------------------------op name -------------------------------------------------- ................ ................op nameinstalled op name ..installed ................ ................compatible .. /bin/sh: line 0: type: git: not found installed installed-------------------------------------------------- .. compatiblecompatible ..---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam......cpu_adam ............... [OKAY]...............[YES]............... [YES][YES]...... ...... ...... [OKAY] [OKAY] fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_adamfused_lamb fused_adam..........................fused_adam .............[NO].............[NO] ....... [NO].......[NO] [OKAY].............. [OKAY] [OKAY] [OKAY]fused_lamb /bin/sh: line 0: type: git: not found fused_lamb............. fused_lamb.............[NO] sparse_attn.............[NO]....... ............[NO].......[OKAY] [NO] .......[OKAY] ....... [OKAY][OKAY] transformer ............ sparse_attn[NO] .......sparse_attn............ [OKAY][NO]sparse_attn............ [NO] ....... ............stochastic_transformer.......[OKAY] [NO] [OKAY] . **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer.......[NO]transformer [OKAY]............................... [OKAY][NO][NO] transformer .......................... [OKAY][NO][OKAY] .......stochastic_transformer stochastic_transformer [OKAY] . .[NO] stochastic_transformer[NO]....... ....... [OKAY] [OKAY]. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2 deepspeed infodeepspeed install path .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ...............DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... torch version11.1 ....................nvcc version 1.8.1..................... 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version .....................deepspeed info 11.2................... deepspeed install path0.4.2+bc17042, bc17042, big-science ........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... torch 1.8, cuda 11.1deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io-------------------------------------------------- ............... [NO] ....... [NO] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] --------------------------------------------------op name-------------------------------------------------- op name ................ op name ................op name installed installed ................ .................. ..installed installed compatible....compatible utils .................. [YES] ...... [OKAY] ----------------------------------------------------------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adamcpu_adam ..............................cpu_adamcpu_adam [YES][YES].............................. ............[YES][YES] [OKAY][OKAY]............ -------------------------------------------------- [OKAY][OKAY] fused_adamfused_adam .......................... fused_adam[NO]fused_adam [NO] ........................................ [NO][OKAY][OKAY] [NO] ....... .......fused_lambfused_lamb[OKAY] [OKAY]..........................fused_lamb [NO][NO]fused_lamb............. ........................... [NO] [OKAY] [OKAY][NO]....... .......[OKAY] [OKAY] sparse_attnsparse_attn ............sparse_attn............sparse_attn [NO][NO] ............ .......................... [NO] [NO][OKAY] [OKAY] ....... ....... transformer [OKAY] [OKAY]transformer ............ ............transformer[NO] [NO]transformer................... ....... [NO][OKAY] ............ [OKAY] .......[NO] stochastic_transformer [OKAY]stochastic_transformer ....... .. [OKAY] stochastic_transformer[NO] [NO] .............. stochastic_transformer [OKAY]. [OKAY] .[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY] utils ..................quantizer [YES] .................... [NO] ....... [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................ ................ ................ installed installedinstalled.. installed .. compatible .. ..compatible -------------------------------------------------- compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adam cpu_adam...............[OKAY] ...............[YES]............... [YES]......[YES] ............ [OKAY][OKAY] [OKAY]fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adam ..........................fused_lamb [NO][NO]fused_adam............. .......[NO] .................... [OKAY] .......[NO] [OKAY] [OKAY] fused_lamb....... [OKAY]............. fused_lamb [NO]............. fused_lamb ....... [NO] ............. [OKAY] .......[NO] sparse_attn [OKAY]............ .......[NO] [OKAY]....... [OKAY] sparse_attn transformer............ ............ sparse_attn [NO] [NO] ............ ....... ....... [NO][OKAY][OKAY]sparse_attn ....... transformerstochastic_transformer............ [OKAY]............ [NO] .[NO].......transformer [NO] [OKAY]....... ............ ....... [NO][OKAY][OKAY] transformer....... stochastic_transformer ............ [OKAY] [NO]. [NO] ....... .......stochastic_transformer[OKAY] [OKAY] . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... ..................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op name op name................ ................ ................ ................installed installedinstalled installed .. .... .. compatiblecompatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam ............... [YES] ............... [YES] ...... [YES] ...... [OKAY] ...... cpu_adam [OKAY] [OKAY] ............... [YES]fused_adam ................... fused_adam fused_adam[OKAY][NO] .......................... ....... [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] fused_lamb .............fused_lamb [NO]fused_lamb............. .......[NO] ............. [OKAY] ....... [NO]fused_adam [OKAY].................... [OKAY][NO] ....... sparse_attn[OKAY] ............sparse_attn sparse_attn[NO]............ ...................[NO] [NO] fused_lamb[OKAY] ....... ....... [OKAY][OKAY] .............transformer transformertransformer [NO] .................................... [NO][NO][NO] ............................ [OKAY][OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformerstochastic_transformer. [NO].. .......[NO][NO] [OKAY].............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info ................... 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... 1.8.1 torch install pathtorch cuda version .............................. 11.1 nvcc version ..................... 11.2 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed install path ........... torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].................... deepspeed info1.8.1 ................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version deepspeed wheel compiled w................ ......11.1 torch 1.8, cuda 11.1nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] utilsquantizer ................................ [NO][YES] ............. [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ninjaninjaninjaninja .................. .................. .................. [OKAY].................. [OKAY] [OKAY]-------------------------------------------------- [OKAY] --------------------------------------------------op name-------------------------------------------------- -------------------------------------------------- op name................ op name ................op name installed ................ installed ................installed.... installed..compatiblecompatible compatible..-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam.............................. ............... ...............[YES][YES] [YES]......[YES]...... ......[OKAY]......[OKAY] [OKAY] [OKAY] fused_adam fused_adamfused_adamfused_adam............. [NO].......................... ............. .......[NO] [NO] [OKAY][NO].............. ....... [OKAY] fused_lamb[OKAY] [OKAY] .............fused_lamb [NO]fused_lamb fused_lamb ............. ....... .......................... [NO] [NO].......[OKAY][NO] ....... [OKAY] ....... [OKAY] [OKAY] sparse_attnsparse_attn sparse_attn ........................ sparse_attn[NO]............ [NO] ............[NO] .............. .......[NO][OKAY][OKAY] [OKAY]....... transformer [OKAY]............transformer [NO] transformer ....... transformer........................ [OKAY] ............ [NO] [NO] stochastic_transformer[NO] ....... ....... .[OKAY][OKAY]....... [NO][OKAY]stochastic_transformer ....... stochastic_transformer .[OKAY] stochastic_transformer[NO]. .......[NO] . [OKAY] .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................................... ..................[OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name--------------------------------------------------op name op name ................op name ................ ................installed................ installed..installed compatible ..installed.. -------------------------------------------------- compatible .. compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] cpu_adam [YES]cpu_adam .................................... [YES][YES][OKAY] fused_adam ...... ...... ............. [OKAY] [OKAY] [NO] ....... fused_adam[OKAY] ............. [NO] fused_lamb....... fused_adam[OKAY]fused_adam............. ..........................[NO]fused_lamb [NO].......[NO]............. .......[NO][OKAY]....... [OKAY][OKAY] ....... [OKAY] fused_lambfused_lamb .......................... sparse_attn[NO][NO] .......................... [NO][OKAY][OKAY] sparse_attn ....... ............[OKAY] [NO] ....... transformer[OKAY] ............ [NO]transformersparse_attn ...................sparse_attn ........................[OKAY] [NO] [NO][NO]....... [OKAY]stochastic_transformer ....... ....... . stochastic_transformer[NO][OKAY] [OKAY] ....... . transformer [OKAY] transformer[NO] ............ .......[NO]............ [OKAY]....... [NO][OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op nameop name................ ................................................installed installedinstalledinstalled .... compatible.. .. compatible compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam ............... [YES]............... ............... [YES][YES]...... [YES] ......[OKAY]...... ...... [OKAY] [OKAY] [OKAY] fused_adam fused_adam............. fused_adam.............[NO] fused_adam ............. [NO]....... ............. .......[OKAY][NO] [NO][OKAY]....... fused_lamb[OKAY]....... .............fused_lamb[OKAY] fused_lamb[NO] ............. ....................[NO] fused_lamb[NO].......[OKAY] ....... .............[OKAY] [OKAY][NO] ....... [OKAY] sparse_attn ............ [NO]sparse_attnsparse_attn ............................... [OKAY]sparse_attn[NO][NO] .......................... transformer [NO] [OKAY] [OKAY] ................... [OKAY][NO] transformer.......transformertransformer [OKAY].................................... [NO][NO] [NO] .............. stochastic_transformer....... [OKAY] [OKAY] [OKAY]. [NO] stochastic_transformer....... stochastic_transformer stochastic_transformer [OKAY] . ..[NO] [NO][NO]....... ..............[OKAY] [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version ...............torch cuda version 11.1............... 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.1 nvcc version ..................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version1.8.1 ....................torch version torch cuda version 1.8.1 .................... ............... 1.8.1torch cuda version11.1 ...............nvcc versiontorch cuda version 11.1 ............... ..................... nvcc version 11.1 11.2 ..................... nvcc version deepspeed install path 11.2 ..................... ........... deepspeed install path 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........deepspeed install path deepspeed info........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info0.4.2+bc17042, bc17042, big-science deepspeed info...................deepspeed wheel compiled w. .........................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.1 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op nameop name ................................ ................................ installed installed installedinstalled .. .. ..compatible.. ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] --------------------------------------------------compatiblecompatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op nameop name ................................ ................installed................installed installed.. installed.... compatiblecompatible ..compatible -------------------------------------------------- -------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam...... cpu_adam[OKAY] ............... cpu_adam ............... [YES]cpu_adam cpu_adamcpu_adam...... ...............[OKAY]............... ............... ............... ............... [YES] [YES] [YES] ...... ............ [OKAY]fused_adam[OKAY][OKAY] [YES][YES][YES] ............ ......[OKAY] [OKAY][OKAY]fused_adam ............. [NO] ....... [OKAY] ............. [NO] ....... [OKAY]fused_adam fused_adamfused_lamb fused_adam............. fused_adam ............. .............[NO].............[NO] .......[NO][NO]....... [OKAY] .......[OKAY] .......[OKAY]fused_lamb [OKAY]............. fused_adam.............fused_adam ..........................fused_lamb[NO] [NO].......[NO]............. .......[OKAY]....... [NO] [OKAY][OKAY]fused_lamb....... fused_lamb[NO] ....................fused_lamb sparse_attn [OKAY][NO]............. .............fused_lamb [OKAY]fused_lamb[NO] ............ .......[NO][NO] [OKAY].............. .......................... ....... [NO] [NO] [OKAY] ....... [OKAY][OKAY] sparse_attntransformer ........................ [NO][NO] .............. [OKAY]sparse_attn[OKAY] .......[OKAY] [OKAY]sparse_attn ............ [NO] ....... [OKAY] transformersparse_attn............ stochastic_transformer............ ............ [NO].[NO] [NO] [NO] ............................ [OKAY][OKAY][OKAY][OKAY] sparse_attn transformersparse_attn ............ ............ sparse_attn............ [NO][NO] ............ [NO]....... ....... [NO][OKAY].......[OKAY] [OKAY]....... transformer [OKAY]stochastic_transformer transformerstochastic_transformer transformer............ . ............ [NO] [NO] [NO] .............. .......[OKAY][OKAY] [OKAY] ............transformer [NO].transformer ............ [NO] ....... ............ ....... [NO] [OKAY] [OKAY][NO] ....... stochastic_transformer .stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY] stochastic_transformer.......[OKAY] [OKAY]. stochastic_transformer[NO] stochastic_transformer....... . [OKAY] [NO]. .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... 1.8.1nvcc version ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... 11.2deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninja ninja...................................................... .................. [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................op nameop name ................................................installed installed..installedinstalled ..compatible.... compatible--------------------------------------------------compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... cpu_adam...............[OKAY] cpu_adam [YES]............... .....................[YES] [OKAY]......[YES] [OKAY]fused_adam...... ............. [OKAY][NO]fused_adam .................... fused_adam [OKAY] [NO] ............. .......[NO] [OKAY]fused_lamb....... fused_adam .............fused_lamb [OKAY] .......................... [NO] [NO]fused_lamb.......[NO] ..............[OKAY]............. [OKAY][NO] [OKAY]....... [OKAY] fused_lamb .............sparse_attn [NO]............ sparse_attn[NO]....... ...................sparse_attn [OKAY][NO]............[OKAY] [NO]....... transformer.......[OKAY] ............[OKAY] [NO]transformer transformer...................sparse_attn ............[OKAY][NO] ............[NO] ....... [NO].......stochastic_transformer[OKAY] [OKAY]....... . stochastic_transformer [OKAY]stochastic_transformer [NO] ......... [NO][NO][OKAY] transformer....... ............ ....... [OKAY] [NO] [OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name ................op name................................ installedinstalled................installed ......installed compatible compatiblecompatible.. ------------------------------------------------------------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam[YES] ............... .....................[YES]............... [OKAY][YES]......[YES] [OKAY]............ [OKAY] [OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY]fused_adamfused_adam[NO] .................... .............fused_lamb[OKAY][NO] [NO] ............. ....... fused_lamb....... [NO] [OKAY] .............[OKAY]....... [NO][OKAY] .......fused_lamb fused_lamb [OKAY] ............. ............. [NO][NO] .............. [OKAY]sparse_attn[OKAY] ............ [NO] .......sparse_attn [OKAY]............ [NO] transformer....... ............[OKAY] sparse_attnsparse_attn [NO] ............................... transformer [NO] [NO][OKAY] ............ ....... ....... [NO] [OKAY][OKAY]....... stochastic_transformer [OKAY]transformer . transformer ............[NO] stochastic_transformer [NO]............ ....... . .......[NO] [OKAY] [NO] [OKAY] ....... ....... [OKAY][OKAY]stochastic_transformer . [NO]stochastic_transformer ....... .[OKAY] [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name op name................ ................ ................ ................installedinstalled installed installed .. ....compatible ..compatible compatible -------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam...............[YES] ...............[YES]...... ............... ......[YES] [OKAY] [YES] [OKAY]...... ...... [OKAY][OKAY] fused_adam ............. fused_adam[NO] fused_adamfused_adam .................... .............[OKAY][NO]............. [NO].......[NO] fused_lamb ....... [OKAY]....... ............. ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY]fused_lamb [NO] .................... fused_lambfused_lamb [NO][OKAY]............. [OKAY][OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- ....................[NO] [OKAY] [NO] ....... .......[OKAY] op name [OKAY] ................op nameop name................ installed................installed................ installed.... installed ..compatiblecompatible .. compatible ----------------------------------------------------------------------------------------------------compatible sparse_attn ............ sparse_attnsparse_attn[NO]sparse_attn ............ ................... ............ [NO][NO] [OKAY] -------------------------------------------------- -------------------------------------------------- [NO].............. .......transformer[OKAY] [OKAY] ............[OKAY] transformer cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam ............... .....................[YES]............... [YES] [OKAY][YES] ...... transformer[NO] ...............................transformer [NO][NO]............[OKAY] ...... [OKAY]......[OKAY] [OKAY] ..............[NO] [OKAY][OKAY]....... stochastic_transformer [OKAY] fused_adam ............. [NO] .......fused_adam fused_adam fused_adam[OKAY] ............. .stochastic_transformer stochastic_transformer [NO] stochastic_transformer . ....... . [NO] . [OKAY][NO]....... [NO][OKAY]....... ............. [NO].............fused_lamb[NO] .......[NO].................... .......[OKAY][OKAY][NO] [OKAY]....... fused_lamb[OKAY] fused_lambfused_lamb............. .......[OKAY] [OKAY] [NO].......................... .......[NO][NO] [OKAY]....... .......sparse_attn[OKAY] [OKAY]............ [NO] ....... [OKAY] transformer ............ [NO]sparse_attn sparse_attn ............................... sparse_attn [OKAY][NO] [NO] ............ ....... ....... stochastic_transformer[NO] [OKAY][OKAY] . ....... transformer[OKAY][NO]transformer ...................transformer............ [OKAY][NO] [NO]............ ..............[NO] [OKAY] [OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer . stochastic_transformer.[NO] [NO]........ [OKAY].......[NO] [OKAY]....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name ................ op name................ ................ installed installed installed .................. .. ..compatible installed compatible compatible--------------------------------------------------.. -------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES]............... ...............cpu_adam......[YES] [YES]............... [OKAY] ...... ......[YES] [OKAY] [OKAY] ...... [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adam fused_adam .......................... fused_lamb .............[NO][NO] [NO] ............. .....................[NO] [OKAY][OKAY][OKAY]....... [OKAY]fused_lamb fused_lamb fused_lamb ............. ............. ............. [NO] [NO][NO]....... .......sparse_attn.......[OKAY] [OKAY]............[OKAY] [NO] ....... [OKAY] transformer ............ [NO] ....... sparse_attn[OKAY]sparse_attn sparse_attn .................................... stochastic_transformer [NO] [NO] [NO]....... ........ ....... [OKAY][NO] [OKAY] [OKAY] .......transformer transformer............transformer [OKAY] ............ [NO] ............ [NO] ....... [NO] ....... [OKAY] ....... [OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer .. . [NO] [NO] [NO]....... ..............[OKAY] [OKAY][OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY]utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................op name op name................ installed ................ installed ................installed.. .. installed..compatible compatiblecompatible--------------------------------------------------.. ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adamcpu_adam[OKAY] .............................. ............... [YES] [YES] [YES] ...... ............ [OKAY] fused_adam [OKAY][OKAY] ............. [NO] ....... [OKAY] fused_adamfused_lambfused_adam fused_adam ............. ............. .......................... [NO] [NO] [NO].......[NO]....... .......[OKAY][OKAY]....... [OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[NO] ................................. [NO] sparse_attn [OKAY] [NO]....... ............ .......[NO][OKAY] .......[OKAY] [OKAY] transformer ............ [NO] sparse_attn....... ............[OKAY] sparse_attn[NO]sparse_attn stochastic_transformer............................... [NO].[OKAY] [NO] ....... [NO] .......transformer[OKAY]....... ............[OKAY] transformer [OKAY] [NO] transformer ................... ............[OKAY] [NO] [NO] .............. stochastic_transformer[OKAY] [OKAY] . [NO] stochastic_transformer.......stochastic_transformer [OKAY] .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY][OKAY]---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- --------------------------------------------------op name ................ op name................op nameinstalled ..................................installed installedcompatible.. installed --------------------------------------------------....compatible compatiblecompatible -------------------------------------------------- ----------------------------------------------------------------------------------------------------cpu_adam ............... [YES] ...... [OKAY] cpu_adam ...............cpu_adamcpu_adam [YES].............................. ......[YES][YES] fused_adam ...... [OKAY]...... .............[OKAY] [OKAY][NO] ....... [OKAY] fused_adamfused_lamb .............fused_adam............. fused_adam[NO][NO]............. ....................[NO] ....... [OKAY] [NO].......[OKAY] .......[OKAY] [OKAY] fused_lamb .............fused_lamb fused_lamb [NO]............. sparse_attn [NO]................................ .......[NO][NO][OKAY] .......[OKAY]....... [OKAY][OKAY] transformer ............ [NO] sparse_attn....... sparse_attn............[OKAY] ............sparse_attn[NO] stochastic_transformer [NO]............ ....... ........ [NO] [OKAY][NO][OKAY] .............. transformertransformer [OKAY][OKAY] ............ ............ [NO]transformer [NO] ....... ............ ....... [OKAY][NO][OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer .stochastic_transformer. [NO][NO]. .......[NO]....... .......[OKAY][OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY] [OKAY] quantizer ..............quantizer [NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info: torch version .................... 1.8.1 torch install pathtorch cuda version .............................. 11.1 nvcc version ..................... 11.2['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install path ...........torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].................... deepspeed info1.8.1 ................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version ...............deepspeed wheel compiled w. ......11.1 torch 1.8, cuda 11.1nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name-------------------------------------------------- op name ................op name................op name installedinstalled................................ .. ..installed installed compatible ..compatible..-------------------------------------------------- compatible --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]cpu_adam cpu_adamcpu_adam .............................. ............... [YES] [YES] [YES]...... fused_adam ...... ......[OKAY][OKAY] ............. [OKAY][NO] ....... [OKAY] fused_lambfused_adam fused_adam .......................... fused_adam .............[NO] [NO] .................... [NO] ....... [NO][OKAY] ....... [OKAY] .......[OKAY] fused_lamb[OKAY] fused_lamb............. fused_lamb.............[NO] .............sparse_attn[NO]....... [OKAY]...................[NO] [NO]....... [OKAY]....... [OKAY][OKAY] transformer ............sparse_attn [NO]............ .......[NO] sparse_attn [OKAY]....... sparse_attn ............[OKAY] stochastic_transformer[NO]............ transformer........ [NO] ............[NO] [OKAY] .......[NO]....... transformer.......[OKAY] [OKAY] ............ [OKAY] transformer [NO] ................... [NO][OKAY] stochastic_transformer ....... [OKAY]. stochastic_transformer [NO] stochastic_transformer........ [OKAY][NO]. .......[NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] nvcc versionnvcc version .......................................... 11.211.2 [OKAY]------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop nameop name deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science ................................op name ................ installedinstalledinstalled ................ .... .. installed compatiblecompatible ..compatible ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- cpu_adamcpu_adamcpu_adam ............... cpu_adam............... [YES]............... ............... [YES][YES] ............ [YES] [OKAY]......[OKAY] ......[OKAY] [OKAY] fused_adamfused_adam .............fused_adam............. [NO] fused_adam....................[NO] [OKAY][NO] .................... fused_lamb[OKAY].......[NO] ............. [OKAY]fused_lamb[NO]....... .......[OKAY]............. fused_lamb[NO] [OKAY]fused_lamb .................... .............[NO][OKAY] [NO] ....... .......[OKAY] [OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] sparse_attn.......transformer sparse_attn ............[OKAY] ............ ............ [NO] transformer[NO][NO] ....... ................... ....... [OKAY][OKAY] [NO] [OKAY] .......transformer transformer[OKAY] ............ stochastic_transformer............ [NO]stochastic_transformer[NO] . ....... ........ [NO] [OKAY][NO].......[OKAY] [OKAY]....... [OKAY]stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] ....... [OKAY]quantizer .............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name................................ ................................installed installed installed.. installed .... compatible compatible ..compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adam [OKAY] ...............cpu_adam............... ...............[YES][YES] [YES]............ ......[OKAY][OKAY] fused_adam [OKAY]............. [NO] .......fused_adam [OKAY]............. fused_adam .............fused_adam [NO][NO] fused_lamb............. ....... ....................[NO][OKAY] [OKAY][NO]....... fused_lamb ....... [OKAY] .............fused_lamb [OKAY] .............fused_lamb[NO] [NO]....... ............. .......[OKAY][NO] sparse_attn[OKAY]....... ............[OKAY] [NO] ....... [OKAY] sparse_attntransformer ........................ sparse_attn [NO][NO] sparse_attn .......................... ............[OKAY][NO][OKAY] [NO].......stochastic_transformer .......[OKAY]transformer . [OKAY] ............ [NO] transformer[NO] transformer....... ............................... [OKAY] [OKAY][NO][NO] .............. [OKAY][OKAY]stochastic_transformer . stochastic_transformer[NO] stochastic_transformer ........ .[NO][OKAY] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY] utils .................. [YES] ......transformer_inference [OKAY].. [NO] ....... quantizer[OKAY] .............. [NO] ....... utils[OKAY] .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninjaninjaninja ninja ...................................................... [OKAY].................. [OKAY] [OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name ................op nameop name op name installed ................ ................................ .. installedinstalledinstalledcompatible .. ..-------------------------------------------------- .. compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adam[OKAY] cpu_adamcpu_adam............... ...............[YES]............... ......[YES][YES] [OKAY]...... ...... fused_adam [OKAY] [OKAY] ............. [NO] fused_adam....... .............[OKAY] [NO] fused_adam.......fused_lamb [OKAY] fused_adam............. ............. [NO].............[NO] fused_lamb ....... .......[OKAY][NO]............. [OKAY].......[NO] .......[OKAY] [OKAY]fused_lamb fused_lamb............. .............sparse_attn[NO] [NO]................... [NO] ....... [OKAY].......[OKAY]sparse_attn [OKAY] ............ [NO] ....... [OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY]sparse_attn[OKAY]sparse_attn ........................ [NO]stochastic_transformer [NO]stochastic_transformer ....... ........[OKAY]. [NO][OKAY][NO]transformer ....... ....... [OKAY]transformer[OKAY] ............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: torch install path torch install path .............................. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version ........................................ 1.8.1torch version1.8.1 ....................torch cuda versiontorch cuda version 1.8.1.............................. 11.111.1 torch cuda version nvcc version nvcc version ............... ..................... ..................... 11.1 11.2 11.2 nvcc versiondeepspeed install pathdeepspeed install path ........................................... 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info...........deepspeed info ...................................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. deepspeed wheel compiled w. ................... ...... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... ....... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................DeepSpeed general environment info: 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1 ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path DeepSpeed general environment info:........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... torch install path0.4.2+bc17042, bc17042, big-science ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO]transformer_inference ....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] quantizer...... ..............[OKAY] [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]async_io ............... --------------------------------------------------[NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op nameop nameop name op name ................ ................................installed ................ installed .. installedinstalled .. compatible .... compatible --------------------------------------------------compatible compatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... cpu_adamcpu_adamcpu_adam[YES] .............................. ...... [YES][YES]...............[OKAY] ............[YES] [OKAY][OKAY]...... [OKAY] fused_adam ............. fused_adamfused_adam[NO] fused_adam ................................. .............[NO][NO][OKAY] .......[NO]....... [OKAY]fused_lamb .......[OKAY] .............[OKAY]fused_lamb fused_lamb[NO] .................................fused_lamb [NO][OKAY][NO] ............. ..............[NO] [OKAY][OKAY]....... [OKAY] sparse_attn ............ [NO] sparse_attn....... sparse_attnsparse_attn [OKAY] ........................ ............ [NO][NO][NO]transformer ................................. [OKAY][NO][OKAY][OKAY] ....... transformer transformer[OKAY]transformer .................................... [NO]stochastic_transformer[NO] [NO] ..................... [OKAY].[OKAY][OKAY] [NO] stochastic_transformer....... stochastic_transformerstochastic_transformer[OKAY] . [NO]. ........ [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-data/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1161730.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 110 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-data/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch install pathdeepspeed wheel compiled w. ..................... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name................................ ................ installed................ installed installed.. installed .. .. compatible.. compatible compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam[YES] ................................................... [YES][YES] [OKAY] [YES]...... ...... ......[OKAY][OKAY] [OKAY] fused_adam ............. [NO] fused_adamfused_adam.......fused_adam ............. ............. .............[OKAY] [NO] [NO][NO] .......fused_lamb.............. [OKAY].............[OKAY] [OKAY] [NO]fused_lamb .......fused_lamb............. fused_lamb [OKAY] ............. [NO]............. [NO].......[NO] ..............[OKAY] [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformersparse_attnsparse_attn ........................ [NO] ............................... [NO][OKAY][NO][NO] ....... ....... transformer.......[OKAY] [OKAY]............[OKAY] stochastic_transformer[NO] transformer.......transformer. ............ ............ [NO][OKAY] [NO][NO]....... .......stochastic_transformer[OKAY]....... [OKAY]. [OKAY] [NO] .......stochastic_transformer [OKAY]stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja ninja.................. ..................[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- op name op name................ ................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ............ ............[NO] [NO]....... .......[OKAY] [OKAY] transformertransformer ............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer . [NO]. ....... [NO][OKAY] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. ....................................[OKAY].................. [OKAY][OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name op name op nameop name................................ ................ ................installed installedinstalledinstalled.. ..compatible.... --------------------------------------------------compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam......cpu_adam ...............cpu_adam[OKAY]............... [YES][YES] ............... ............ [OKAY] [OKAY] [YES] fused_adam ................... [OKAY][NO] ....... [OKAY]fused_adam fused_adam ..........................fused_lamb .............[NO][NO] [NO]....... ....... ....... fused_adam[OKAY] [OKAY] [OKAY] .............fused_lamb fused_lamb [NO] ............. ............. .......[NO] sparse_attn [NO][OKAY] ....... ............ [OKAY]....... [NO]fused_lamb[OKAY] ....... .............[OKAY] [NO] .......transformer sparse_attn............[OKAY] sparse_attn............[NO] ...................[NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] stochastic_transformer transformertransformer . sparse_attn ............ [NO] ........................[NO] [NO].............. .......[OKAY][NO][OKAY] [OKAY] stochastic_transformer.......stochastic_transformer [OKAY] . .[NO] [NO]transformer....... .......[OKAY] [OKAY]............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name ................op name ................ installed ................installed .................. .. installedcompatiblecompatibleinstalled ..----------------------------------------------------------------------------------------------------.. compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam [YES]cpu_adam [YES] ...... .............................. ......[OKAY] [YES][YES][OKAY] ............ [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... fused_adamfused_adam .............[OKAY]............. .............[NO] [NO]....... [NO] .......fused_lamb [OKAY] ....... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** [OKAY]............. [OKAY]fused_lamb[NO] fused_lamb.................... .............fused_lamb[NO][OKAY] [NO]....... ............. [OKAY].......[NO] [OKAY]....... [OKAY] sparse_attn ............ sparse_attn[NO] ................... sparse_attn[OKAY]sparse_attn[NO] ........................transformer....... [NO][OKAY]............[NO] [NO].............. transformer....... [OKAY] ............[OKAY] [OKAY][NO] transformer.......transformer stochastic_transformer[OKAY]........................ [NO][NO] . stochastic_transformer ....... [NO]....... [OKAY]. .......[OKAY] [NO][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninja ninja.................. ninja.................. ....................................[OKAY] [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name op name................................ op nameinstalled ................ installed..installed ....................compatible compatiblecompatible installed -------------------------------------------------- ---------------------------------------------------------------------------------------------------- .. compatible --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... cpu_adam[YES] .....................cpu_adam [YES][OKAY]............... ...... cpu_adam[YES][OKAY] async_io ............... [NO] ....... [NO] fused_adam...... ............................[OKAY] [NO] .......[YES] [OKAY]fused_adam ...... .............[OKAY] fused_lamb transformer_inference .. [NO] ....... [OKAY] [NO] fused_adam .......................... .......[NO] [NO] [OKAY] .............. fused_lambfused_adam [OKAY] [OKAY]............. utils .................. [YES] ...... [OKAY] ............. [NO] fused_lamb.......[NO] ....................[OKAY] sparse_attn[NO][OKAY] quantizer .............. [NO] ....... [OKAY] ............ .......fused_lamb[NO] [OKAY].................... -------------------------------------------------- [OKAY][NO] sparse_attn ............ [NO]transformer .......................... [NO][OKAY] [OKAY] ....... sparse_attn [OKAY] transformer ........................ stochastic_transformer [NO][NO]. .............. [NO] [OKAY]sparse_attn [OKAY] transformer.......stochastic_transformer............ [OKAY]............[NO] . [NO] ....... [OKAY].......[NO] [OKAY]....... [OKAY]transformer stochastic_transformer............ . [NO][NO] ....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... [OKAY][OKAY].................. [OKAY]----------------------------------------------------------------------------------------------------[OKAY] --------------------------------------------------op nameop name --------------------------------------------------................op name................ ................installedop nameinstalled ..installed.................. compatiblecompatible .. installed --------------------------------------------------compatible .. --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... [YES]cpu_adam............... ......[YES] ............... [OKAY] ...... [YES] [OKAY]cpu_adam ...... [OKAY] ...............fused_adam [YES].............fused_adam [NO]fused_adam .................... [OKAY][NO]...... .................... fused_lamb[NO][OKAY] ....................[OKAY] fused_lamb[NO][OKAY] .................... fused_lamb[OKAY][NO] .................... [NO][OKAY] ....... [OKAY] fused_adam ............. [NO] .......sparse_attn ............ sparse_attn[NO]sparse_attn ............................... [OKAY] [NO] [NO][OKAY] .......transformer ...................[OKAY] fused_lamb[OKAY] [NO]transformer .................... ............[NO][OKAY]transformer .......[NO]............ .......stochastic_transformer[NO] [OKAY]........[OKAY] [OKAY][NO] stochastic_transformer....... stochastic_transformer[OKAY] . .[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................ ................................installedinstalled installed..installed.. .... compatible compatible compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam ...............cpu_adam.............................. [YES] ...............[YES] [YES] ...... ......[YES] ...... [OKAY]......[OKAY] [OKAY][OKAY] fused_adam .............fused_adam fused_adam fused_adam[NO] ............. ............. .......[NO] ............. [NO][OKAY] ....... [NO].......[OKAY] fused_lamb[OKAY]....... .............[OKAY]fused_lamb [NO]fused_lamb............. fused_lamb ....... .............[NO] ............. [OKAY][NO]....... [NO] ....... ....... [OKAY] [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attn transformersparse_attn............ ............[NO]........................ [NO] .......[NO][NO] [OKAY] ..................... [OKAY]transformer[OKAY] [OKAY] ............ transformer [NO]stochastic_transformer ............transformer....... . [NO] ............ [OKAY] [NO][NO] ....... ....... ....... [OKAY] stochastic_transformer [OKAY][OKAY] stochastic_transformer. [NO]stochastic_transformer . ....... [OKAY].[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ...... [OKAY]utils .................. [YES] quantizer...... [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils utils.................. ..................[YES] [YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................ op nameop name................ ................installed................installed installedinstalled.. .... .. compatiblecompatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... cpu_adam .............................. [YES] [YES][YES] ..................... ............ [OKAY] [YES] [OKAY][OKAY] ...... [OKAY] fused_adam fused_adamfused_adam............. .............fused_adam .............[NO] ............. [NO] [NO].......[NO]....... [OKAY].......[OKAY]....... [OKAY][OKAY] fused_lamb fused_lamb fused_lamb ............. .............fused_lamb .............[NO][NO] ............. [NO] ..............[NO] [OKAY] [OKAY].............. [OKAY][OKAY] sparse_attnsparse_attnsparse_attn ............ sparse_attn........................ ............ [NO] [NO][NO][NO]....... ..............[OKAY]....... [OKAY][OKAY][OKAY] transformer transformer transformer ............ ............transformer............ [NO][NO]............[NO] ....... .......[NO] .......[OKAY] .......[OKAY][OKAY] [OKAY]stochastic_transformer stochastic_transformerstochastic_transformer . stochastic_transformer. [NO][NO]. . ....... ....... [NO] [NO][OKAY] [OKAY] .............. [OKAY][OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference .. [NO] ....... [OKAY] utils .................. [YES]async_io ...... ...............[OKAY] async_io[NO] ...................... quantizer[NO][NO] ..................... [NO][NO] ....... [OKAY] --------------------------------------------------transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info:deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. utils .................. [YES] ...... [OKAY] quantizer[YES] .................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] transformer_inference....... ..[NO] [NO] ....... [OKAY] utils .................. transformer_inference[YES] ........ [OKAY][NO] ....... [OKAY] quantizer .............. [NO] utils....... ..................[OKAY] [YES] ......-------------------------------------------------- [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version ...............torch cuda version ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op nameop name ................ ................................ ................ installedinstalled installed installed ...... ..compatiblecompatiblecompatible compatible------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adam cpu_adam ...............cpu_adam ...............[YES]............... .....................[YES][YES] [OKAY]......[YES]...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. fused_adam[NO] fused_adamfused_adam............. ....... ............. ............. [NO][OKAY] [NO][NO]....... fused_lamb.......[OKAY]....... ............. [OKAY] [OKAY][NO] fused_lamb .................... fused_lamb fused_lamb[OKAY][NO] ............. ............. ....... [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............transformer [NO]............ sparse_attnsparse_attn ....... [NO] ............ ............[OKAY] ....... [NO] [NO] [OKAY] transformer .............. ............[OKAY][OKAY] [NO]stochastic_transformer ....... [OKAY]. transformer transformer [NO] ............stochastic_transformer ................... [NO][OKAY][NO] . ....... ....... [NO] [OKAY] [OKAY]....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info:deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > setting codecarbon ... ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] ----------------------------------------------------------------------------------------------------[OKAY] op nameop name---------------------------------------------------------------------------------------------------- ................................op name op nameinstalled installed .................................... installedinstalledcompatiblecompatible .. ..-------------------------------------------------- -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] .....................cpu_adam cpu_adam [YES][OKAY]............... ......[YES]............... [OKAY]......[YES] [OKAY]...... fused_adam [OKAY]............. fused_adam[NO] .................... fused_adam[NO][OKAY] .................... fused_adamfused_lamb [OKAY] [NO]............. .......[NO]............. fused_lamb ....... [OKAY] .............[NO][OKAY] [NO] .......fused_lamb....... [OKAY][OKAY]............. [NO] ....... sparse_attn[OKAY]fused_lamb ......................... [NO] [NO]sparse_attn....... ............[OKAY]....... sparse_attn [NO] transformer[OKAY] ........................ ....... [NO][NO][OKAY] ....... .......[OKAY]transformer [OKAY]............sparse_attn stochastic_transformer [NO] ............ . .......transformer [NO][NO]............ [OKAY] .......[NO]....... [OKAY]stochastic_transformer[OKAY] ....... .[OKAY] transformer [NO] ...................stochastic_transformer [OKAY] [NO] ........ [NO][OKAY] ....... [OKAY]stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES]utils ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version 1.8.1............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op name................ ................op name ................ installedinstalled................ installed .. installed ....compatible compatible..compatible-------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... cpu_adam[YES] ............... [YES]...............[YES]...... ...... [OKAY][YES] ...... [OKAY]......[OKAY] [OKAY] fused_adam ............. [NO] fused_adamfused_adam.......fused_adam ............. ..........................[OKAY] [NO] [NO][NO]....... fused_lamb..............[OKAY] .............[OKAY] [OKAY]fused_lamb[NO] .............fused_lamb....... [NO].............[OKAY]fused_lamb .......[NO]............. .......[NO][OKAY] [OKAY] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attn transformer sparse_attn........................ [NO] ........................ [NO] ....... [NO][NO] ....... [OKAY] ....... .......[OKAY][OKAY] transformer[OKAY] ............transformer stochastic_transformertransformer [NO] ............ ....................[NO] [NO] [NO][OKAY].............. .......[OKAY][OKAY] stochastic_transformer[OKAY] stochastic_transformer. stochastic_transformer[NO]. .......[NO] . [OKAY] ....... [NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... utils[OKAY] .................. [YES] ...... [OKAY]utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ...........torch install path ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. ...... torch versiontorch 1.8, cuda 11.1 .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2 deepspeed infodeepspeed install path .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO]transformer_inference ....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version deepspeed install path............... ...........11.1 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................ ................................................installed installedinstalledinstalled.. ....compatible.. -------------------------------------------------- compatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES] cpu_adam......cpu_adam cpu_adam ...............[OKAY] ............... ............... [YES][YES][YES] ...... ...... ...... [OKAY]fused_adam [OKAY] [OKAY] ............. [NO] ....... [OKAY] fused_adam fused_lamb............. fused_adam............. fused_adam............. [NO][NO] ............. [NO].............. .......[NO][OKAY][OKAY] [OKAY] ....... fused_lamb [OKAY]............. fused_lamb[NO] fused_lamb ............. .......[NO].............sparse_attn ....... [NO] [OKAY]............ [OKAY] [NO] ....... .......[OKAY] [OKAY] transformer ............sparse_attn [NO] sparse_attn................... ............sparse_attn [NO][OKAY] [NO] ....... ............ .......[OKAY]stochastic_transformer [NO][OKAY] .......transformer.transformer [NO][OKAY]........................ ....... [OKAY] [NO]transformer[NO] .......................... [OKAY][NO][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer ..stochastic_transformer [NO][NO] ........ [NO].......[OKAY] .......[OKAY] [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name ................op name ................................ installed installed installed ...................... compatibleinstalledcompatible compatible .. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam cpu_adam[YES]............... ............... ...... ...............[YES][OKAY][YES] [YES]............ ......[OKAY][OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam.............fused_adam fused_lamb .............[NO]............. .................... [NO][OKAY] [NO] [NO].............. fused_lamb [OKAY].......[OKAY] [OKAY]............. fused_lamb[NO]fused_lamb ................................. [NO][OKAY][NO] ..............sparse_attn [OKAY][OKAY]............ [NO] ....... sparse_attn[OKAY] ............ [NO] transformer....... sparse_attn............[OKAY]sparse_attn **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ............ [NO] ............ transformer[NO] .......[NO]....... ............[OKAY]....... [OKAY][OKAY] [NO] .......transformertransformer stochastic_transformer [OKAY]............ ............ . [NO] [NO][NO].......stochastic_transformer ....... ....... .[OKAY] [OKAY] [OKAY][NO]stochastic_transformer ....... [OKAY]. stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name................................op name installed................installed................ .. ..installed installedcompatiblecompatible ....---------------------------------------------------------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]cpu_adamcpu_adam...... ...... [OKAY]..............................[OKAY] [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] fused_adam..............fused_adam [OKAY]..........................[OKAY] [NO][NO] ..............fused_lamb fused_lamb[OKAY][OKAY]............. .............[NO] fused_lamb.......[NO]fused_lamb ....... .............[OKAY] .............[OKAY] [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY]sparse_attn sparse_attn ............transformer............ transformer[NO]............[NO] [NO] ............ .............. ....... [OKAY][NO] [OKAY] [OKAY] ....... transformer [OKAY]transformer............ stochastic_transformer ............[NO] .[NO] .......stochastic_transformer [NO] ....... [OKAY]........ [OKAY][OKAY] [NO]stochastic_transformer ....... [OKAY].stochastic_transformer [NO]. .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY]utils .................. [YES] ......quantizer [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... [OKAY].................. [YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ...... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > initializing torch distributed ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op nameop name................ ................ installed................ ................installed .. installed ..installed..compatible compatiblecompatible..-------------------------------------------------- --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adamcpu_adam [OKAY] ............... .............................. [YES][YES] [YES] ...... ...... fused_adam...... [OKAY] [OKAY].............[OKAY] [NO] ....... [OKAY] fused_adam ............. fused_lamb[NO]fused_adam fused_adam ............. .................... ............. [NO] [OKAY].......[NO][NO] [OKAY] ....... fused_lamb....... [OKAY]............. [OKAY] [NO] .......fused_lamb [OKAY]fused_lambsparse_attn ............. .........................[NO] [NO][NO]....... ..............[OKAY] sparse_attn[OKAY][OKAY] ............ transformer[NO] ................... [NO][OKAY] ....... sparse_attntransformer[OKAY] ............sparse_attn ............ [NO] ............ stochastic_transformer[NO]....... [NO] [OKAY] .............. . [OKAY][OKAY]stochastic_transformer [NO] transformer........ [OKAY]transformer[NO]............ ...................[NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- > setting tensorboard ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-24 02:11:21,021] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.307 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 27.029 seconds time to initialize megatron (seconds): 68.551 [after megatron is initialized] datetime: 2021-09-24 02:11:48 building GPT model ... [2021-09-24 02:11:48,760] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-24 02:11:48,762] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-24 02:11:48,763] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.77 GB, percent = 20.2% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-24 02:11:50,155] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 [2021-09-24 02:11:51,439] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-24 02:11:51,440] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-24 02:11:51,441] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.96 GB, percent = 20.3% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-24 02:11:51,495] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-24 02:11:51,606] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-24 02:11:51,606] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-24 02:11:51,606] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-24 02:11:51,606] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-24 02:11:51,607] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-24 02:11:51,607] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-24 02:11:51,607] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-24 02:11:51,607] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-24 02:11:51,607] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-24 02:11:51,607] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-24 02:11:56,299] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-24 02:11:56,299] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-24 02:11:56,299] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-24 02:11:56,299] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-24 02:11:56,300] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-24 02:11:56,300] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] amp_params ................... False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] dump_state ................... False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] pld_params ................... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-24 02:11:56,302] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-24 02:11:56,302] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints will not load any checkpoints and will start from random [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. time (ms) | load-checkpoint: 1.91 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-24 02:11:56 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.214922 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.337 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.309 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.060 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-24 02:12:03 done with setup ... training ... time (ms) | model-and-optimizer-setup: 8062.72 | train/valid/test-data-iterators-setup: 5729.09 [before the start of training step] datetime: 2021-09-24 02:12:03 [2021-09-24 02:12:03,365] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-24 02:12:03,365] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-24 02:12:03,365] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-24 02:12:03,365] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-24 02:12:03,365] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 1] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 33] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 65] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 97] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 225] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0 [Rank 129] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 193] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 161] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 2] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 34] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 226] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 21700.0 | max reserved: 21700.0 [Rank 66] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18778.0 | max reserved: 18778.0 [Rank 98] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 130] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 194] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18650.0 | max reserved: 18650.0 [Rank 162] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 0] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21470.0 | max reserved: 21470.0 [Rank 64] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19252.0 | max reserved: 19252.0 [Rank 32] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 128] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 96] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 224] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0 [Rank 192] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 160] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 35] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 3] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 67] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0 [Rank 99] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 131] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0 [Rank 227] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 21700.0 | max reserved: 21700.0 [Rank 195] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 163] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 iteration 1/ 159576 | consumed samples: 16 | elapsed time per iteration (ms): 31536.2 | learning rate: 4.438E-09 | global batch size: 16 | lm loss: 1.426722E+01 | loss scale: 4096.0 | grad norm: 1863985.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2/ 159576 | consumed samples: 32 | elapsed time per iteration (ms): 13049.6 | learning rate: 8.876E-09 | global batch size: 16 | lm loss: 1.429125E+01 | loss scale: 4096.0 | grad norm: 1882741.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3/ 159576 | consumed samples: 48 | elapsed time per iteration (ms): 13671.4 | learning rate: 1.331E-08 | global batch size: 16 | lm loss: 1.421026E+01 | loss scale: 4096.0 | grad norm: 1871916.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4/ 159576 | consumed samples: 64 | elapsed time per iteration (ms): 13544.5 | learning rate: 1.775E-08 | global batch size: 16 | lm loss: 1.424627E+01 | loss scale: 4096.0 | grad norm: 1912485.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5/ 159576 | consumed samples: 80 | elapsed time per iteration (ms): 13955.0 | learning rate: 2.219E-08 | global batch size: 16 | lm loss: 1.421161E+01 | loss scale: 4096.0 | grad norm: 1873991.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6/ 159576 | consumed samples: 96 | elapsed time per iteration (ms): 13725.9 | learning rate: 2.663E-08 | global batch size: 16 | lm loss: 1.423833E+01 | loss scale: 4096.0 | grad norm: 1889068.937 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7/ 159576 | consumed samples: 112 | elapsed time per iteration (ms): 13496.8 | learning rate: 3.107E-08 | global batch size: 16 | lm loss: 1.423929E+01 | loss scale: 4096.0 | grad norm: 1864001.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8/ 159576 | consumed samples: 128 | elapsed time per iteration (ms): 13565.8 | learning rate: 3.550E-08 | global batch size: 16 | lm loss: 1.424760E+01 | loss scale: 4096.0 | grad norm: 1867381.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9/ 159576 | consumed samples: 144 | elapsed time per iteration (ms): 14076.3 | learning rate: 3.994E-08 | global batch size: 16 | lm loss: 1.418199E+01 | loss scale: 4096.0 | grad norm: 1902029.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10/ 159576 | consumed samples: 160 | elapsed time per iteration (ms): 13497.5 | learning rate: 4.438E-08 | global batch size: 16 | lm loss: 1.412427E+01 | loss scale: 4096.0 | grad norm: 1865649.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11/ 159576 | consumed samples: 176 | elapsed time per iteration (ms): 13459.5 | learning rate: 4.882E-08 | global batch size: 16 | lm loss: 1.407386E+01 | loss scale: 4096.0 | grad norm: 1861067.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12/ 159576 | consumed samples: 192 | elapsed time per iteration (ms): 13581.0 | learning rate: 5.325E-08 | global batch size: 16 | lm loss: 1.400436E+01 | loss scale: 4096.0 | grad norm: 1857208.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 13/ 159576 | consumed samples: 208 | elapsed time per iteration (ms): 13877.0 | learning rate: 5.769E-08 | global batch size: 16 | lm loss: 1.374212E+01 | loss scale: 4096.0 | grad norm: 1860712.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 14/ 159576 | consumed samples: 224 | elapsed time per iteration (ms): 13730.6 | learning rate: 6.213E-08 | global batch size: 16 | lm loss: 1.363158E+01 | loss scale: 4096.0 | grad norm: 1835837.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 15/ 159576 | consumed samples: 240 | elapsed time per iteration (ms): 13589.9 | learning rate: 6.657E-08 | global batch size: 16 | lm loss: 1.353429E+01 | loss scale: 4096.0 | grad norm: 1866742.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 16/ 159576 | consumed samples: 256 | elapsed time per iteration (ms): 13709.9 | learning rate: 7.101E-08 | global batch size: 16 | lm loss: 1.346230E+01 | loss scale: 4096.0 | grad norm: 1867848.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 17/ 159576 | consumed samples: 272 | elapsed time per iteration (ms): 13515.8 | learning rate: 7.544E-08 | global batch size: 16 | lm loss: 1.257517E+01 | loss scale: 4096.0 | grad norm: 1827444.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 18/ 159576 | consumed samples: 288 | elapsed time per iteration (ms): 13800.0 | learning rate: 7.988E-08 | global batch size: 16 | lm loss: 1.251998E+01 | loss scale: 4096.0 | grad norm: 2020558.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 19/ 159576 | consumed samples: 304 | elapsed time per iteration (ms): 13516.3 | learning rate: 8.432E-08 | global batch size: 16 | lm loss: 1.265157E+01 | loss scale: 4096.0 | grad norm: 2257407.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 20/ 159576 | consumed samples: 320 | elapsed time per iteration (ms): 13549.6 | learning rate: 8.876E-08 | global batch size: 16 | lm loss: 1.252521E+01 | loss scale: 4096.0 | grad norm: 2095375.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 21/ 159576 | consumed samples: 336 | elapsed time per iteration (ms): 13586.7 | learning rate: 9.320E-08 | global batch size: 16 | lm loss: 1.244903E+01 | loss scale: 4096.0 | grad norm: 2211855.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 22/ 159576 | consumed samples: 352 | elapsed time per iteration (ms): 14140.0 | learning rate: 9.763E-08 | global batch size: 16 | lm loss: 1.221426E+01 | loss scale: 4096.0 | grad norm: 2152853.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 23/ 159576 | consumed samples: 368 | elapsed time per iteration (ms): 13565.7 | learning rate: 1.021E-07 | global batch size: 16 | lm loss: 1.223387E+01 | loss scale: 4096.0 | grad norm: 2257726.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 24/ 159576 | consumed samples: 384 | elapsed time per iteration (ms): 13529.2 | learning rate: 1.065E-07 | global batch size: 16 | lm loss: 1.252795E+01 | loss scale: 4096.0 | grad norm: 2648402.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 25/ 159576 | consumed samples: 400 | elapsed time per iteration (ms): 13468.4 | learning rate: 1.109E-07 | global batch size: 16 | lm loss: 1.249682E+01 | loss scale: 4096.0 | grad norm: 2816711.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 26/ 159576 | consumed samples: 416 | elapsed time per iteration (ms): 13529.9 | learning rate: 1.154E-07 | global batch size: 16 | lm loss: 1.219784E+01 | loss scale: 4096.0 | grad norm: 2380750.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 27/ 159576 | consumed samples: 432 | elapsed time per iteration (ms): 13833.4 | learning rate: 1.198E-07 | global batch size: 16 | lm loss: 1.182601E+01 | loss scale: 4096.0 | grad norm: 2116005.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 28/ 159576 | consumed samples: 448 | elapsed time per iteration (ms): 13615.6 | learning rate: 1.243E-07 | global batch size: 16 | lm loss: 1.159655E+01 | loss scale: 4096.0 | grad norm: 1805209.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 29/ 159576 | consumed samples: 464 | elapsed time per iteration (ms): 13371.2 | learning rate: 1.287E-07 | global batch size: 16 | lm loss: 1.165552E+01 | loss scale: 4096.0 | grad norm: 1731569.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 30/ 159576 | consumed samples: 480 | elapsed time per iteration (ms): 13604.8 | learning rate: 1.331E-07 | global batch size: 16 | lm loss: 1.154380E+01 | loss scale: 4096.0 | grad norm: 1706578.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 31/ 159576 | consumed samples: 496 | elapsed time per iteration (ms): 13982.3 | learning rate: 1.376E-07 | global batch size: 16 | lm loss: 1.139362E+01 | loss scale: 4096.0 | grad norm: 1757980.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 32/ 159576 | consumed samples: 512 | elapsed time per iteration (ms): 13306.0 | learning rate: 1.420E-07 | global batch size: 16 | lm loss: 1.148209E+01 | loss scale: 4096.0 | grad norm: 1697993.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 33/ 159576 | consumed samples: 528 | elapsed time per iteration (ms): 13575.8 | learning rate: 1.464E-07 | global batch size: 16 | lm loss: 1.140995E+01 | loss scale: 4096.0 | grad norm: 1670562.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 34/ 159576 | consumed samples: 544 | elapsed time per iteration (ms): 13613.2 | learning rate: 1.509E-07 | global batch size: 16 | lm loss: 1.132776E+01 | loss scale: 4096.0 | grad norm: 1643305.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 35/ 159576 | consumed samples: 560 | elapsed time per iteration (ms): 13869.9 | learning rate: 1.553E-07 | global batch size: 16 | lm loss: 1.136237E+01 | loss scale: 4096.0 | grad norm: 1648846.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 36/ 159576 | consumed samples: 576 | elapsed time per iteration (ms): 13789.0 | learning rate: 1.598E-07 | global batch size: 16 | lm loss: 1.143323E+01 | loss scale: 4096.0 | grad norm: 1598861.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 37/ 159576 | consumed samples: 592 | elapsed time per iteration (ms): 13658.0 | learning rate: 1.642E-07 | global batch size: 16 | lm loss: 1.115875E+01 | loss scale: 4096.0 | grad norm: 1562919.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 38/ 159576 | consumed samples: 608 | elapsed time per iteration (ms): 13961.2 | learning rate: 1.686E-07 | global batch size: 16 | lm loss: 1.117768E+01 | loss scale: 4096.0 | grad norm: 1565543.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 39/ 159576 | consumed samples: 624 | elapsed time per iteration (ms): 13410.4 | learning rate: 1.731E-07 | global batch size: 16 | lm loss: 1.111340E+01 | loss scale: 4096.0 | grad norm: 1536768.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 40/ 159576 | consumed samples: 640 | elapsed time per iteration (ms): 13891.8 | learning rate: 1.775E-07 | global batch size: 16 | lm loss: 1.106657E+01 | loss scale: 4096.0 | grad norm: 1548421.837 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 41/ 159576 | consumed samples: 656 | elapsed time per iteration (ms): 13633.3 | learning rate: 1.820E-07 | global batch size: 16 | lm loss: 1.094995E+01 | loss scale: 4096.0 | grad norm: 1532446.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 42/ 159576 | consumed samples: 672 | elapsed time per iteration (ms): 13643.8 | learning rate: 1.864E-07 | global batch size: 16 | lm loss: 1.087856E+01 | loss scale: 4096.0 | grad norm: 1531337.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 43/ 159576 | consumed samples: 688 | elapsed time per iteration (ms): 13630.7 | learning rate: 1.908E-07 | global batch size: 16 | lm loss: 1.084412E+01 | loss scale: 4096.0 | grad norm: 1473539.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 44/ 159576 | consumed samples: 704 | elapsed time per iteration (ms): 14118.0 | learning rate: 1.953E-07 | global batch size: 16 | lm loss: 1.114596E+01 | loss scale: 4096.0 | grad norm: 1496700.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 45/ 159576 | consumed samples: 720 | elapsed time per iteration (ms): 13853.8 | learning rate: 1.997E-07 | global batch size: 16 | lm loss: 1.092829E+01 | loss scale: 4096.0 | grad norm: 1454980.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 46/ 159576 | consumed samples: 736 | elapsed time per iteration (ms): 13549.0 | learning rate: 2.041E-07 | global batch size: 16 | lm loss: 1.074461E+01 | loss scale: 4096.0 | grad norm: 1397083.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 47/ 159576 | consumed samples: 752 | elapsed time per iteration (ms): 13627.3 | learning rate: 2.086E-07 | global batch size: 16 | lm loss: 1.066580E+01 | loss scale: 4096.0 | grad norm: 1311670.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 48/ 159576 | consumed samples: 768 | elapsed time per iteration (ms): 13674.9 | learning rate: 2.130E-07 | global batch size: 16 | lm loss: 1.055744E+01 | loss scale: 4096.0 | grad norm: 1292299.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 49/ 159576 | consumed samples: 784 | elapsed time per iteration (ms): 13932.1 | learning rate: 2.175E-07 | global batch size: 16 | lm loss: 1.060610E+01 | loss scale: 4096.0 | grad norm: 1283482.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 50/ 159576 | consumed samples: 800 | elapsed time per iteration (ms): 13665.9 | learning rate: 2.219E-07 | global batch size: 16 | lm loss: 1.063007E+01 | loss scale: 4096.0 | grad norm: 1228203.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 51/ 159576 | consumed samples: 816 | elapsed time per iteration (ms): 13667.5 | learning rate: 2.263E-07 | global batch size: 16 | lm loss: 1.046357E+01 | loss scale: 4096.0 | grad norm: 1219490.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 52/ 159576 | consumed samples: 832 | elapsed time per iteration (ms): 13793.6 | learning rate: 2.308E-07 | global batch size: 16 | lm loss: 1.061804E+01 | loss scale: 4096.0 | grad norm: 1197068.783 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 53/ 159576 | consumed samples: 848 | elapsed time per iteration (ms): 14209.6 | learning rate: 2.352E-07 | global batch size: 16 | lm loss: 1.041930E+01 | loss scale: 4096.0 | grad norm: 1168890.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 54/ 159576 | consumed samples: 864 | elapsed time per iteration (ms): 13453.2 | learning rate: 2.396E-07 | global batch size: 16 | lm loss: 1.035855E+01 | loss scale: 4096.0 | grad norm: 1126594.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 55/ 159576 | consumed samples: 880 | elapsed time per iteration (ms): 13666.6 | learning rate: 2.441E-07 | global batch size: 16 | lm loss: 1.051081E+01 | loss scale: 4096.0 | grad norm: 1080949.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 56/ 159576 | consumed samples: 896 | elapsed time per iteration (ms): 13689.5 | learning rate: 2.485E-07 | global batch size: 16 | lm loss: 1.048364E+01 | loss scale: 4096.0 | grad norm: 1069119.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 57/ 159576 | consumed samples: 912 | elapsed time per iteration (ms): 14289.6 | learning rate: 2.530E-07 | global batch size: 16 | lm loss: 1.048154E+01 | loss scale: 4096.0 | grad norm: 1016407.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 58/ 159576 | consumed samples: 928 | elapsed time per iteration (ms): 13663.2 | learning rate: 2.574E-07 | global batch size: 16 | lm loss: 1.019213E+01 | loss scale: 4096.0 | grad norm: 982402.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 59/ 159576 | consumed samples: 944 | elapsed time per iteration (ms): 13704.5 | learning rate: 2.618E-07 | global batch size: 16 | lm loss: 1.019982E+01 | loss scale: 4096.0 | grad norm: 965254.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 60/ 159576 | consumed samples: 960 | elapsed time per iteration (ms): 13846.3 | learning rate: 2.663E-07 | global batch size: 16 | lm loss: 1.021626E+01 | loss scale: 4096.0 | grad norm: 926021.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 61/ 159576 | consumed samples: 976 | elapsed time per iteration (ms): 13469.9 | learning rate: 2.707E-07 | global batch size: 16 | lm loss: 1.008368E+01 | loss scale: 4096.0 | grad norm: 911608.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 62/ 159576 | consumed samples: 992 | elapsed time per iteration (ms): 13774.9 | learning rate: 2.751E-07 | global batch size: 16 | lm loss: 9.892099E+00 | loss scale: 4096.0 | grad norm: 882114.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 63/ 159576 | consumed samples: 1008 | elapsed time per iteration (ms): 13514.1 | learning rate: 2.796E-07 | global batch size: 16 | lm loss: 9.876393E+00 | loss scale: 4096.0 | grad norm: 834416.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 64/ 159576 | consumed samples: 1024 | elapsed time per iteration (ms): 13538.5 | learning rate: 2.840E-07 | global batch size: 16 | lm loss: 9.927294E+00 | loss scale: 4096.0 | grad norm: 814691.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 65/ 159576 | consumed samples: 1040 | elapsed time per iteration (ms): 13496.5 | learning rate: 2.885E-07 | global batch size: 16 | lm loss: 1.024293E+01 | loss scale: 4096.0 | grad norm: 821175.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 66/ 159576 | consumed samples: 1056 | elapsed time per iteration (ms): 14030.7 | learning rate: 2.929E-07 | global batch size: 16 | lm loss: 9.930872E+00 | loss scale: 4096.0 | grad norm: 759629.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 67/ 159576 | consumed samples: 1072 | elapsed time per iteration (ms): 13743.1 | learning rate: 2.973E-07 | global batch size: 16 | lm loss: 9.852800E+00 | loss scale: 4096.0 | grad norm: 734440.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 68/ 159576 | consumed samples: 1088 | elapsed time per iteration (ms): 13293.2 | learning rate: 3.018E-07 | global batch size: 16 | lm loss: 9.786448E+00 | loss scale: 4096.0 | grad norm: 702591.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 69/ 159576 | consumed samples: 1104 | elapsed time per iteration (ms): 13515.6 | learning rate: 3.062E-07 | global batch size: 16 | lm loss: 9.917148E+00 | loss scale: 4096.0 | grad norm: 689937.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 70/ 159576 | consumed samples: 1120 | elapsed time per iteration (ms): 13786.0 | learning rate: 3.107E-07 | global batch size: 16 | lm loss: 9.593161E+00 | loss scale: 4096.0 | grad norm: 634541.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 71/ 159576 | consumed samples: 1136 | elapsed time per iteration (ms): 13761.6 | learning rate: 3.151E-07 | global batch size: 16 | lm loss: 9.685747E+00 | loss scale: 4096.0 | grad norm: 620089.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 72/ 159576 | consumed samples: 1152 | elapsed time per iteration (ms): 13503.1 | learning rate: 3.195E-07 | global batch size: 16 | lm loss: 9.550736E+00 | loss scale: 4096.0 | grad norm: 592735.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 73/ 159576 | consumed samples: 1168 | elapsed time per iteration (ms): 13574.6 | learning rate: 3.240E-07 | global batch size: 16 | lm loss: 9.780053E+00 | loss scale: 4096.0 | grad norm: 578902.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 74/ 159576 | consumed samples: 1184 | elapsed time per iteration (ms): 13563.6 | learning rate: 3.284E-07 | global batch size: 16 | lm loss: 9.660094E+00 | loss scale: 4096.0 | grad norm: 549632.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 75/ 159576 | consumed samples: 1200 | elapsed time per iteration (ms): 13751.3 | learning rate: 3.328E-07 | global batch size: 16 | lm loss: 9.715110E+00 | loss scale: 4096.0 | grad norm: 523457.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 76/ 159576 | consumed samples: 1216 | elapsed time per iteration (ms): 13613.9 | learning rate: 3.373E-07 | global batch size: 16 | lm loss: 9.548697E+00 | loss scale: 4096.0 | grad norm: 559789.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 77/ 159576 | consumed samples: 1232 | elapsed time per iteration (ms): 13668.9 | learning rate: 3.417E-07 | global batch size: 16 | lm loss: 9.395579E+00 | loss scale: 4096.0 | grad norm: 516053.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 78/ 159576 | consumed samples: 1248 | elapsed time per iteration (ms): 13540.8 | learning rate: 3.462E-07 | global batch size: 16 | lm loss: 9.450207E+00 | loss scale: 4096.0 | grad norm: 491518.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 79/ 159576 | consumed samples: 1264 | elapsed time per iteration (ms): 13951.5 | learning rate: 3.506E-07 | global batch size: 16 | lm loss: 9.312221E+00 | loss scale: 4096.0 | grad norm: 445025.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 80/ 159576 | consumed samples: 1280 | elapsed time per iteration (ms): 13710.1 | learning rate: 3.550E-07 | global batch size: 16 | lm loss: 9.362122E+00 | loss scale: 4096.0 | grad norm: 498046.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 81/ 159576 | consumed samples: 1296 | elapsed time per iteration (ms): 13653.8 | learning rate: 3.595E-07 | global batch size: 16 | lm loss: 9.684261E+00 | loss scale: 4096.0 | grad norm: 460137.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 82/ 159576 | consumed samples: 1312 | elapsed time per iteration (ms): 13416.1 | learning rate: 3.639E-07 | global batch size: 16 | lm loss: 9.111031E+00 | loss scale: 4096.0 | grad norm: 462196.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 83/ 159576 | consumed samples: 1328 | elapsed time per iteration (ms): 13589.7 | learning rate: 3.683E-07 | global batch size: 16 | lm loss: 9.424231E+00 | loss scale: 4096.0 | grad norm: 387492.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 84/ 159576 | consumed samples: 1344 | elapsed time per iteration (ms): 13890.8 | learning rate: 3.728E-07 | global batch size: 16 | lm loss: 9.225885E+00 | loss scale: 4096.0 | grad norm: 477146.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 85/ 159576 | consumed samples: 1360 | elapsed time per iteration (ms): 13578.1 | learning rate: 3.772E-07 | global batch size: 16 | lm loss: 9.449253E+00 | loss scale: 4096.0 | grad norm: 498838.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 86/ 159576 | consumed samples: 1376 | elapsed time per iteration (ms): 13600.8 | learning rate: 3.817E-07 | global batch size: 16 | lm loss: 9.186915E+00 | loss scale: 4096.0 | grad norm: 359821.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 87/ 159576 | consumed samples: 1392 | elapsed time per iteration (ms): 13578.0 | learning rate: 3.861E-07 | global batch size: 16 | lm loss: 9.169426E+00 | loss scale: 4096.0 | grad norm: 336361.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 88/ 159576 | consumed samples: 1408 | elapsed time per iteration (ms): 14258.1 | learning rate: 3.905E-07 | global batch size: 16 | lm loss: 9.174639E+00 | loss scale: 4096.0 | grad norm: 513262.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 89/ 159576 | consumed samples: 1424 | elapsed time per iteration (ms): 13350.5 | learning rate: 3.950E-07 | global batch size: 16 | lm loss: 9.322023E+00 | loss scale: 4096.0 | grad norm: 417913.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 90/ 159576 | consumed samples: 1440 | elapsed time per iteration (ms): 13582.0 | learning rate: 3.994E-07 | global batch size: 16 | lm loss: 9.319530E+00 | loss scale: 4096.0 | grad norm: 326159.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 91/ 159576 | consumed samples: 1456 | elapsed time per iteration (ms): 13577.6 | learning rate: 4.038E-07 | global batch size: 16 | lm loss: 9.305362E+00 | loss scale: 4096.0 | grad norm: 312504.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 92/ 159576 | consumed samples: 1472 | elapsed time per iteration (ms): 13979.9 | learning rate: 4.083E-07 | global batch size: 16 | lm loss: 8.797226E+00 | loss scale: 4096.0 | grad norm: 299274.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 93/ 159576 | consumed samples: 1488 | elapsed time per iteration (ms): 13685.6 | learning rate: 4.127E-07 | global batch size: 16 | lm loss: 9.470177E+00 | loss scale: 4096.0 | grad norm: 889931.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 94/ 159576 | consumed samples: 1504 | elapsed time per iteration (ms): 13625.1 | learning rate: 4.172E-07 | global batch size: 16 | lm loss: 9.601658E+00 | loss scale: 4096.0 | grad norm: 858157.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 95/ 159576 | consumed samples: 1520 | elapsed time per iteration (ms): 13713.7 | learning rate: 4.216E-07 | global batch size: 16 | lm loss: 9.093191E+00 | loss scale: 4096.0 | grad norm: 308888.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 96/ 159576 | consumed samples: 1536 | elapsed time per iteration (ms): 13441.7 | learning rate: 4.260E-07 | global batch size: 16 | lm loss: 9.258781E+00 | loss scale: 4096.0 | grad norm: 285375.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 97/ 159576 | consumed samples: 1552 | elapsed time per iteration (ms): 13952.1 | learning rate: 4.305E-07 | global batch size: 16 | lm loss: 9.267257E+00 | loss scale: 4096.0 | grad norm: 266598.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 98/ 159576 | consumed samples: 1568 | elapsed time per iteration (ms): 13570.4 | learning rate: 4.349E-07 | global batch size: 16 | lm loss: 9.302748E+00 | loss scale: 4096.0 | grad norm: 430050.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 99/ 159576 | consumed samples: 1584 | elapsed time per iteration (ms): 13655.7 | learning rate: 4.393E-07 | global batch size: 16 | lm loss: 9.206352E+00 | loss scale: 4096.0 | grad norm: 522965.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 100/ 159576 | consumed samples: 1600 | elapsed time per iteration (ms): 13606.3 | learning rate: 4.438E-07 | global batch size: 16 | lm loss: 9.212991E+00 | loss scale: 4096.0 | grad norm: 351294.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 101/ 159576 | consumed samples: 1616 | elapsed time per iteration (ms): 14021.3 | learning rate: 4.482E-07 | global batch size: 16 | lm loss: 9.392309E+00 | loss scale: 4096.0 | grad norm: 249407.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 102/ 159576 | consumed samples: 1632 | elapsed time per iteration (ms): 13722.5 | learning rate: 4.527E-07 | global batch size: 16 | lm loss: 9.173745E+00 | loss scale: 4096.0 | grad norm: 230190.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 103/ 159576 | consumed samples: 1648 | elapsed time per iteration (ms): 13481.3 | learning rate: 4.571E-07 | global batch size: 16 | lm loss: 9.060183E+00 | loss scale: 4096.0 | grad norm: 535519.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 104/ 159576 | consumed samples: 1664 | elapsed time per iteration (ms): 13573.2 | learning rate: 4.615E-07 | global batch size: 16 | lm loss: 8.820353E+00 | loss scale: 4096.0 | grad norm: 252106.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 105/ 159576 | consumed samples: 1680 | elapsed time per iteration (ms): 13679.8 | learning rate: 4.660E-07 | global batch size: 16 | lm loss: 8.907228E+00 | loss scale: 4096.0 | grad norm: 227304.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 106/ 159576 | consumed samples: 1696 | elapsed time per iteration (ms): 13833.6 | learning rate: 4.704E-07 | global batch size: 16 | lm loss: 8.920894E+00 | loss scale: 4096.0 | grad norm: 226622.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 107/ 159576 | consumed samples: 1712 | elapsed time per iteration (ms): 13577.9 | learning rate: 4.749E-07 | global batch size: 16 | lm loss: 8.839094E+00 | loss scale: 4096.0 | grad norm: 188033.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 108/ 159576 | consumed samples: 1728 | elapsed time per iteration (ms): 13620.7 | learning rate: 4.793E-07 | global batch size: 16 | lm loss: 9.072345E+00 | loss scale: 4096.0 | grad norm: 405511.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 109/ 159576 | consumed samples: 1744 | elapsed time per iteration (ms): 13608.5 | learning rate: 4.837E-07 | global batch size: 16 | lm loss: 8.981932E+00 | loss scale: 4096.0 | grad norm: 326365.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 110/ 159576 | consumed samples: 1760 | elapsed time per iteration (ms): 13945.7 | learning rate: 4.882E-07 | global batch size: 16 | lm loss: 8.900158E+00 | loss scale: 4096.0 | grad norm: 183771.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 111/ 159576 | consumed samples: 1776 | elapsed time per iteration (ms): 13542.6 | learning rate: 4.926E-07 | global batch size: 16 | lm loss: 8.908926E+00 | loss scale: 4096.0 | grad norm: 189581.109 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 112/ 159576 | consumed samples: 1792 | elapsed time per iteration (ms): 13715.6 | learning rate: 4.970E-07 | global batch size: 16 | lm loss: 8.738115E+00 | loss scale: 4096.0 | grad norm: 176974.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 113/ 159576 | consumed samples: 1808 | elapsed time per iteration (ms): 13456.9 | learning rate: 5.015E-07 | global batch size: 16 | lm loss: 9.185429E+00 | loss scale: 4096.0 | grad norm: 452577.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 114/ 159576 | consumed samples: 1824 | elapsed time per iteration (ms): 14039.5 | learning rate: 5.059E-07 | global batch size: 16 | lm loss: 9.235853E+00 | loss scale: 4096.0 | grad norm: 567475.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 115/ 159576 | consumed samples: 1840 | elapsed time per iteration (ms): 13568.6 | learning rate: 5.104E-07 | global batch size: 16 | lm loss: 8.848898E+00 | loss scale: 4096.0 | grad norm: 182062.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 116/ 159576 | consumed samples: 1856 | elapsed time per iteration (ms): 13607.1 | learning rate: 5.148E-07 | global batch size: 16 | lm loss: 8.955499E+00 | loss scale: 4096.0 | grad norm: 179172.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 117/ 159576 | consumed samples: 1872 | elapsed time per iteration (ms): 13798.7 | learning rate: 5.192E-07 | global batch size: 16 | lm loss: 8.835221E+00 | loss scale: 4096.0 | grad norm: 168846.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 118/ 159576 | consumed samples: 1888 | elapsed time per iteration (ms): 13424.3 | learning rate: 5.237E-07 | global batch size: 16 | lm loss: 9.120043E+00 | loss scale: 4096.0 | grad norm: 304218.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 119/ 159576 | consumed samples: 1904 | elapsed time per iteration (ms): 13992.7 | learning rate: 5.281E-07 | global batch size: 16 | lm loss: 8.877877E+00 | loss scale: 4096.0 | grad norm: 328004.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 120/ 159576 | consumed samples: 1920 | elapsed time per iteration (ms): 13739.9 | learning rate: 5.325E-07 | global batch size: 16 | lm loss: 9.091492E+00 | loss scale: 4096.0 | grad norm: 542667.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 121/ 159576 | consumed samples: 1936 | elapsed time per iteration (ms): 13438.9 | learning rate: 5.370E-07 | global batch size: 16 | lm loss: 8.963889E+00 | loss scale: 4096.0 | grad norm: 173633.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 122/ 159576 | consumed samples: 1952 | elapsed time per iteration (ms): 13659.9 | learning rate: 5.414E-07 | global batch size: 16 | lm loss: 8.973601E+00 | loss scale: 4096.0 | grad norm: 154883.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 123/ 159576 | consumed samples: 1968 | elapsed time per iteration (ms): 14034.9 | learning rate: 5.459E-07 | global batch size: 16 | lm loss: 8.932154E+00 | loss scale: 4096.0 | grad norm: 191305.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 124/ 159576 | consumed samples: 1984 | elapsed time per iteration (ms): 13642.6 | learning rate: 5.503E-07 | global batch size: 16 | lm loss: 8.718765E+00 | loss scale: 4096.0 | grad norm: 141927.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 125/ 159576 | consumed samples: 2000 | elapsed time per iteration (ms): 13607.3 | learning rate: 5.547E-07 | global batch size: 16 | lm loss: 9.022717E+00 | loss scale: 4096.0 | grad norm: 530230.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 126/ 159576 | consumed samples: 2016 | elapsed time per iteration (ms): 13623.2 | learning rate: 5.592E-07 | global batch size: 16 | lm loss: 9.160154E+00 | loss scale: 4096.0 | grad norm: 525377.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 127/ 159576 | consumed samples: 2032 | elapsed time per iteration (ms): 13944.5 | learning rate: 5.636E-07 | global batch size: 16 | lm loss: 8.602621E+00 | loss scale: 4096.0 | grad norm: 180832.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 128/ 159576 | consumed samples: 2048 | elapsed time per iteration (ms): 13652.1 | learning rate: 5.680E-07 | global batch size: 16 | lm loss: 8.848473E+00 | loss scale: 4096.0 | grad norm: 159006.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 129/ 159576 | consumed samples: 2064 | elapsed time per iteration (ms): 13619.4 | learning rate: 5.725E-07 | global batch size: 16 | lm loss: 8.697285E+00 | loss scale: 4096.0 | grad norm: 166208.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 130/ 159576 | consumed samples: 2080 | elapsed time per iteration (ms): 13649.8 | learning rate: 5.769E-07 | global batch size: 16 | lm loss: 8.738346E+00 | loss scale: 4096.0 | grad norm: 142582.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 131/ 159576 | consumed samples: 2096 | elapsed time per iteration (ms): 13648.8 | learning rate: 5.814E-07 | global batch size: 16 | lm loss: 8.628532E+00 | loss scale: 4096.0 | grad norm: 119745.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 132/ 159576 | consumed samples: 2112 | elapsed time per iteration (ms): 13855.7 | learning rate: 5.858E-07 | global batch size: 16 | lm loss: 8.681314E+00 | loss scale: 4096.0 | grad norm: 238581.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 133/ 159576 | consumed samples: 2128 | elapsed time per iteration (ms): 13614.3 | learning rate: 5.902E-07 | global batch size: 16 | lm loss: 8.853155E+00 | loss scale: 4096.0 | grad norm: 190597.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 134/ 159576 | consumed samples: 2144 | elapsed time per iteration (ms): 13742.8 | learning rate: 5.947E-07 | global batch size: 16 | lm loss: 8.840850E+00 | loss scale: 4096.0 | grad norm: 157001.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 135/ 159576 | consumed samples: 2160 | elapsed time per iteration (ms): 13481.4 | learning rate: 5.991E-07 | global batch size: 16 | lm loss: 8.721090E+00 | loss scale: 4096.0 | grad norm: 120761.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 136/ 159576 | consumed samples: 2176 | elapsed time per iteration (ms): 14037.0 | learning rate: 6.036E-07 | global batch size: 16 | lm loss: 8.786610E+00 | loss scale: 4096.0 | grad norm: 109166.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 137/ 159576 | consumed samples: 2192 | elapsed time per iteration (ms): 13631.2 | learning rate: 6.080E-07 | global batch size: 16 | lm loss: 8.825349E+00 | loss scale: 4096.0 | grad norm: 393039.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 138/ 159576 | consumed samples: 2208 | elapsed time per iteration (ms): 13698.2 | learning rate: 6.124E-07 | global batch size: 16 | lm loss: 8.681873E+00 | loss scale: 4096.0 | grad norm: 210924.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 139/ 159576 | consumed samples: 2224 | elapsed time per iteration (ms): 13641.8 | learning rate: 6.169E-07 | global batch size: 16 | lm loss: 8.758416E+00 | loss scale: 4096.0 | grad norm: 111138.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 140/ 159576 | consumed samples: 2240 | elapsed time per iteration (ms): 13650.3 | learning rate: 6.213E-07 | global batch size: 16 | lm loss: 8.646829E+00 | loss scale: 4096.0 | grad norm: 115663.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 141/ 159576 | consumed samples: 2256 | elapsed time per iteration (ms): 14097.3 | learning rate: 6.257E-07 | global batch size: 16 | lm loss: 8.653087E+00 | loss scale: 4096.0 | grad norm: 142126.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 142/ 159576 | consumed samples: 2272 | elapsed time per iteration (ms): 13468.2 | learning rate: 6.302E-07 | global batch size: 16 | lm loss: 8.647311E+00 | loss scale: 4096.0 | grad norm: 163914.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 143/ 159576 | consumed samples: 2288 | elapsed time per iteration (ms): 13544.7 | learning rate: 6.346E-07 | global batch size: 16 | lm loss: 8.564240E+00 | loss scale: 4096.0 | grad norm: 159952.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 144/ 159576 | consumed samples: 2304 | elapsed time per iteration (ms): 13642.1 | learning rate: 6.391E-07 | global batch size: 16 | lm loss: 8.789017E+00 | loss scale: 4096.0 | grad norm: 169255.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 145/ 159576 | consumed samples: 2320 | elapsed time per iteration (ms): 14181.4 | learning rate: 6.435E-07 | global batch size: 16 | lm loss: 8.811962E+00 | loss scale: 4096.0 | grad norm: 127162.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 146/ 159576 | consumed samples: 2336 | elapsed time per iteration (ms): 13492.3 | learning rate: 6.479E-07 | global batch size: 16 | lm loss: 8.774818E+00 | loss scale: 4096.0 | grad norm: 110483.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 147/ 159576 | consumed samples: 2352 | elapsed time per iteration (ms): 13671.3 | learning rate: 6.524E-07 | global batch size: 16 | lm loss: 8.753700E+00 | loss scale: 4096.0 | grad norm: 128181.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 148/ 159576 | consumed samples: 2368 | elapsed time per iteration (ms): 13675.0 | learning rate: 6.568E-07 | global batch size: 16 | lm loss: 8.742964E+00 | loss scale: 4096.0 | grad norm: 140698.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 149/ 159576 | consumed samples: 2384 | elapsed time per iteration (ms): 14154.8 | learning rate: 6.612E-07 | global batch size: 16 | lm loss: 8.705631E+00 | loss scale: 4096.0 | grad norm: 284561.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 150/ 159576 | consumed samples: 2400 | elapsed time per iteration (ms): 13301.3 | learning rate: 6.657E-07 | global batch size: 16 | lm loss: 8.639321E+00 | loss scale: 4096.0 | grad norm: 158457.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 151/ 159576 | consumed samples: 2416 | elapsed time per iteration (ms): 13553.4 | learning rate: 6.701E-07 | global batch size: 16 | lm loss: 8.747204E+00 | loss scale: 4096.0 | grad norm: 217035.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 152/ 159576 | consumed samples: 2432 | elapsed time per iteration (ms): 13577.6 | learning rate: 6.746E-07 | global batch size: 16 | lm loss: 8.711011E+00 | loss scale: 4096.0 | grad norm: 170149.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 153/ 159576 | consumed samples: 2448 | elapsed time per iteration (ms): 13522.0 | learning rate: 6.790E-07 | global batch size: 16 | lm loss: 8.717499E+00 | loss scale: 4096.0 | grad norm: 103133.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 154/ 159576 | consumed samples: 2464 | elapsed time per iteration (ms): 13883.8 | learning rate: 6.834E-07 | global batch size: 16 | lm loss: 8.587013E+00 | loss scale: 4096.0 | grad norm: 99765.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 155/ 159576 | consumed samples: 2480 | elapsed time per iteration (ms): 13554.0 | learning rate: 6.879E-07 | global batch size: 16 | lm loss: 8.698885E+00 | loss scale: 4096.0 | grad norm: 282680.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 156/ 159576 | consumed samples: 2496 | elapsed time per iteration (ms): 13692.4 | learning rate: 6.923E-07 | global batch size: 16 | lm loss: 9.289864E+00 | loss scale: 4096.0 | grad norm: 609278.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 157/ 159576 | consumed samples: 2512 | elapsed time per iteration (ms): 13306.0 | learning rate: 6.967E-07 | global batch size: 16 | lm loss: 8.803203E+00 | loss scale: 4096.0 | grad norm: 221182.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 02:48:11] PULSE: tr8-104B is waiting to be scheduled (1159457_[1-10%1] on 'gpu_p13' partition) [2021-09-24 02:48:11] PULSE: tr8-104B is scheduled to start in 18:26:36 (at 2021-09-24T21:14:48) (1161605 on 'gpu_p13' partition) [2021-09-24 02:48:11] PULSE: tr8-104B is running for 37:09 since 2021-09-24T02:11:02 (1161730 on 'gpu_p13' partition (r6i4n7,r6i5n[7-8],r6i6n[0,6,8],r6i7n3,r7i2n[2,4-5],r7i3n2,r7i6n[2-4],r7i7n[3,7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i3n[0-2],r8i5n[3-4],r8i7n[3-6,8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 158/ 159576 | consumed samples: 2528 | elapsed time per iteration (ms): 13873.2 | learning rate: 7.012E-07 | global batch size: 16 | lm loss: 8.628306E+00 | loss scale: 4096.0 | grad norm: 200507.061 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 159/ 159576 | consumed samples: 2544 | elapsed time per iteration (ms): 13466.2 | learning rate: 7.056E-07 | global batch size: 16 | lm loss: 8.632781E+00 | loss scale: 4096.0 | grad norm: 103638.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 160/ 159576 | consumed samples: 2560 | elapsed time per iteration (ms): 13494.3 | learning rate: 7.101E-07 | global batch size: 16 | lm loss: 8.596104E+00 | loss scale: 4096.0 | grad norm: 92105.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 161/ 159576 | consumed samples: 2576 | elapsed time per iteration (ms): 13517.5 | learning rate: 7.145E-07 | global batch size: 16 | lm loss: 8.408714E+00 | loss scale: 4096.0 | grad norm: 78965.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 162/ 159576 | consumed samples: 2592 | elapsed time per iteration (ms): 13540.1 | learning rate: 7.189E-07 | global batch size: 16 | lm loss: 9.134837E+00 | loss scale: 4096.0 | grad norm: 524949.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 163/ 159576 | consumed samples: 2608 | elapsed time per iteration (ms): 13879.1 | learning rate: 7.234E-07 | global batch size: 16 | lm loss: 8.601346E+00 | loss scale: 4096.0 | grad norm: 206465.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 164/ 159576 | consumed samples: 2624 | elapsed time per iteration (ms): 13564.5 | learning rate: 7.278E-07 | global batch size: 16 | lm loss: 8.734079E+00 | loss scale: 4096.0 | grad norm: 159985.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 165/ 159576 | consumed samples: 2640 | elapsed time per iteration (ms): 13607.4 | learning rate: 7.322E-07 | global batch size: 16 | lm loss: 8.629238E+00 | loss scale: 4096.0 | grad norm: 89678.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 166/ 159576 | consumed samples: 2656 | elapsed time per iteration (ms): 13687.7 | learning rate: 7.367E-07 | global batch size: 16 | lm loss: 8.753635E+00 | loss scale: 4096.0 | grad norm: 108761.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 167/ 159576 | consumed samples: 2672 | elapsed time per iteration (ms): 14101.4 | learning rate: 7.411E-07 | global batch size: 16 | lm loss: 8.647141E+00 | loss scale: 4096.0 | grad norm: 78778.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 168/ 159576 | consumed samples: 2688 | elapsed time per iteration (ms): 13827.5 | learning rate: 7.456E-07 | global batch size: 16 | lm loss: 8.838135E+00 | loss scale: 4096.0 | grad norm: 301360.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 169/ 159576 | consumed samples: 2704 | elapsed time per iteration (ms): 13776.5 | learning rate: 7.500E-07 | global batch size: 16 | lm loss: 8.865972E+00 | loss scale: 4096.0 | grad norm: 230779.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 170/ 159576 | consumed samples: 2720 | elapsed time per iteration (ms): 13667.3 | learning rate: 7.544E-07 | global batch size: 16 | lm loss: 8.716210E+00 | loss scale: 4096.0 | grad norm: 133087.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 171/ 159576 | consumed samples: 2736 | elapsed time per iteration (ms): 13974.1 | learning rate: 7.589E-07 | global batch size: 16 | lm loss: 8.726005E+00 | loss scale: 4096.0 | grad norm: 112595.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 172/ 159576 | consumed samples: 2752 | elapsed time per iteration (ms): 13644.3 | learning rate: 7.633E-07 | global batch size: 16 | lm loss: 8.704071E+00 | loss scale: 4096.0 | grad norm: 92111.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 173/ 159576 | consumed samples: 2768 | elapsed time per iteration (ms): 13586.4 | learning rate: 7.678E-07 | global batch size: 16 | lm loss: 8.823001E+00 | loss scale: 4096.0 | grad norm: 93068.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 174/ 159576 | consumed samples: 2784 | elapsed time per iteration (ms): 13629.3 | learning rate: 7.722E-07 | global batch size: 16 | lm loss: 8.521597E+00 | loss scale: 4096.0 | grad norm: 79887.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 175/ 159576 | consumed samples: 2800 | elapsed time per iteration (ms): 13647.0 | learning rate: 7.766E-07 | global batch size: 16 | lm loss: 9.370278E+00 | loss scale: 4096.0 | grad norm: 576797.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 176/ 159576 | consumed samples: 2816 | elapsed time per iteration (ms): 13993.8 | learning rate: 7.811E-07 | global batch size: 16 | lm loss: 9.255205E+00 | loss scale: 4096.0 | grad norm: 337846.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 177/ 159576 | consumed samples: 2832 | elapsed time per iteration (ms): 13778.2 | learning rate: 7.855E-07 | global batch size: 16 | lm loss: 9.038449E+00 | loss scale: 4096.0 | grad norm: 339366.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 178/ 159576 | consumed samples: 2848 | elapsed time per iteration (ms): 13515.3 | learning rate: 7.899E-07 | global batch size: 16 | lm loss: 8.771539E+00 | loss scale: 4096.0 | grad norm: 216761.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 179/ 159576 | consumed samples: 2864 | elapsed time per iteration (ms): 13657.6 | learning rate: 7.944E-07 | global batch size: 16 | lm loss: 8.718536E+00 | loss scale: 4096.0 | grad norm: 103470.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 180/ 159576 | consumed samples: 2880 | elapsed time per iteration (ms): 14095.5 | learning rate: 7.988E-07 | global batch size: 16 | lm loss: 8.968449E+00 | loss scale: 4096.0 | grad norm: 88300.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 181/ 159576 | consumed samples: 2896 | elapsed time per iteration (ms): 13570.0 | learning rate: 8.033E-07 | global batch size: 16 | lm loss: 8.743597E+00 | loss scale: 4096.0 | grad norm: 73637.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 182/ 159576 | consumed samples: 2912 | elapsed time per iteration (ms): 13631.2 | learning rate: 8.077E-07 | global batch size: 16 | lm loss: 8.650385E+00 | loss scale: 4096.0 | grad norm: 170612.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 183/ 159576 | consumed samples: 2928 | elapsed time per iteration (ms): 13666.1 | learning rate: 8.121E-07 | global batch size: 16 | lm loss: 8.764441E+00 | loss scale: 4096.0 | grad norm: 157032.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 184/ 159576 | consumed samples: 2944 | elapsed time per iteration (ms): 14033.7 | learning rate: 8.166E-07 | global batch size: 16 | lm loss: 8.546231E+00 | loss scale: 4096.0 | grad norm: 68818.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 185/ 159576 | consumed samples: 2960 | elapsed time per iteration (ms): 13755.2 | learning rate: 8.210E-07 | global batch size: 16 | lm loss: 8.605597E+00 | loss scale: 4096.0 | grad norm: 245599.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 186/ 159576 | consumed samples: 2976 | elapsed time per iteration (ms): 13693.9 | learning rate: 8.254E-07 | global batch size: 16 | lm loss: 8.735710E+00 | loss scale: 4096.0 | grad norm: 193090.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 187/ 159576 | consumed samples: 2992 | elapsed time per iteration (ms): 13666.7 | learning rate: 8.299E-07 | global batch size: 16 | lm loss: 8.800616E+00 | loss scale: 4096.0 | grad norm: 121643.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 188/ 159576 | consumed samples: 3008 | elapsed time per iteration (ms): 13617.1 | learning rate: 8.343E-07 | global batch size: 16 | lm loss: 8.450140E+00 | loss scale: 4096.0 | grad norm: 91010.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 189/ 159576 | consumed samples: 3024 | elapsed time per iteration (ms): 14107.4 | learning rate: 8.388E-07 | global batch size: 16 | lm loss: 8.680673E+00 | loss scale: 4096.0 | grad norm: 171815.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 190/ 159576 | consumed samples: 3040 | elapsed time per iteration (ms): 13662.7 | learning rate: 8.432E-07 | global batch size: 16 | lm loss: 8.619300E+00 | loss scale: 4096.0 | grad norm: 80825.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 191/ 159576 | consumed samples: 3056 | elapsed time per iteration (ms): 13715.7 | learning rate: 8.476E-07 | global batch size: 16 | lm loss: 8.438683E+00 | loss scale: 4096.0 | grad norm: 68255.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 192/ 159576 | consumed samples: 3072 | elapsed time per iteration (ms): 13611.5 | learning rate: 8.521E-07 | global batch size: 16 | lm loss: 8.685935E+00 | loss scale: 4096.0 | grad norm: 100702.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 193/ 159576 | consumed samples: 3088 | elapsed time per iteration (ms): 14234.2 | learning rate: 8.565E-07 | global batch size: 16 | lm loss: 8.644808E+00 | loss scale: 4096.0 | grad norm: 193299.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 194/ 159576 | consumed samples: 3104 | elapsed time per iteration (ms): 13631.4 | learning rate: 8.609E-07 | global batch size: 16 | lm loss: 8.574228E+00 | loss scale: 4096.0 | grad norm: 141638.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 195/ 159576 | consumed samples: 3120 | elapsed time per iteration (ms): 13610.1 | learning rate: 8.654E-07 | global batch size: 16 | lm loss: 8.461662E+00 | loss scale: 4096.0 | grad norm: 102623.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 196/ 159576 | consumed samples: 3136 | elapsed time per iteration (ms): 13581.2 | learning rate: 8.698E-07 | global batch size: 16 | lm loss: 8.478310E+00 | loss scale: 4096.0 | grad norm: 64740.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 197/ 159576 | consumed samples: 3152 | elapsed time per iteration (ms): 13626.3 | learning rate: 8.743E-07 | global batch size: 16 | lm loss: 8.468125E+00 | loss scale: 4096.0 | grad norm: 113590.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 198/ 159576 | consumed samples: 3168 | elapsed time per iteration (ms): 14045.8 | learning rate: 8.787E-07 | global batch size: 16 | lm loss: 8.800446E+00 | loss scale: 4096.0 | grad norm: 157117.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 199/ 159576 | consumed samples: 3184 | elapsed time per iteration (ms): 13670.2 | learning rate: 8.831E-07 | global batch size: 16 | lm loss: 8.530574E+00 | loss scale: 4096.0 | grad norm: 71020.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 200/ 159576 | consumed samples: 3200 | elapsed time per iteration (ms): 13673.4 | learning rate: 8.876E-07 | global batch size: 16 | lm loss: 8.573134E+00 | loss scale: 4096.0 | grad norm: 68974.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 201/ 159576 | consumed samples: 3216 | elapsed time per iteration (ms): 13793.0 | learning rate: 8.920E-07 | global batch size: 16 | lm loss: 8.408599E+00 | loss scale: 4096.0 | grad norm: 69080.768 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 202/ 159576 | consumed samples: 3232 | elapsed time per iteration (ms): 13826.3 | learning rate: 8.964E-07 | global batch size: 16 | lm loss: 8.511511E+00 | loss scale: 4096.0 | grad norm: 111260.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 203/ 159576 | consumed samples: 3248 | elapsed time per iteration (ms): 13532.8 | learning rate: 9.009E-07 | global batch size: 16 | lm loss: 8.359414E+00 | loss scale: 4096.0 | grad norm: 178104.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 204/ 159576 | consumed samples: 3264 | elapsed time per iteration (ms): 13664.5 | learning rate: 9.053E-07 | global batch size: 16 | lm loss: 8.641071E+00 | loss scale: 4096.0 | grad norm: 200697.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 205/ 159576 | consumed samples: 3280 | elapsed time per iteration (ms): 13644.0 | learning rate: 9.098E-07 | global batch size: 16 | lm loss: 8.579686E+00 | loss scale: 4096.0 | grad norm: 127286.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 206/ 159576 | consumed samples: 3296 | elapsed time per iteration (ms): 14372.0 | learning rate: 9.142E-07 | global batch size: 16 | lm loss: 8.340457E+00 | loss scale: 4096.0 | grad norm: 79901.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 207/ 159576 | consumed samples: 3312 | elapsed time per iteration (ms): 13542.0 | learning rate: 9.186E-07 | global batch size: 16 | lm loss: 8.573874E+00 | loss scale: 4096.0 | grad norm: 54182.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 208/ 159576 | consumed samples: 3328 | elapsed time per iteration (ms): 13770.4 | learning rate: 9.231E-07 | global batch size: 16 | lm loss: 8.671753E+00 | loss scale: 4096.0 | grad norm: 118528.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 209/ 159576 | consumed samples: 3344 | elapsed time per iteration (ms): 13735.7 | learning rate: 9.275E-07 | global batch size: 16 | lm loss: 8.323320E+00 | loss scale: 4096.0 | grad norm: 84996.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 210/ 159576 | consumed samples: 3360 | elapsed time per iteration (ms): 13465.7 | learning rate: 9.320E-07 | global batch size: 16 | lm loss: 8.521966E+00 | loss scale: 4096.0 | grad norm: 58490.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 211/ 159576 | consumed samples: 3376 | elapsed time per iteration (ms): 14045.3 | learning rate: 9.364E-07 | global batch size: 16 | lm loss: 8.366361E+00 | loss scale: 4096.0 | grad norm: 60420.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 212/ 159576 | consumed samples: 3392 | elapsed time per iteration (ms): 13641.0 | learning rate: 9.408E-07 | global batch size: 16 | lm loss: 8.510538E+00 | loss scale: 4096.0 | grad norm: 107003.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 213/ 159576 | consumed samples: 3408 | elapsed time per iteration (ms): 13705.1 | learning rate: 9.453E-07 | global batch size: 16 | lm loss: 8.749462E+00 | loss scale: 4096.0 | grad norm: 127548.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 214/ 159576 | consumed samples: 3424 | elapsed time per iteration (ms): 13700.1 | learning rate: 9.497E-07 | global batch size: 16 | lm loss: 8.406161E+00 | loss scale: 4096.0 | grad norm: 77133.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 215/ 159576 | consumed samples: 3440 | elapsed time per iteration (ms): 14278.2 | learning rate: 9.541E-07 | global batch size: 16 | lm loss: 8.418405E+00 | loss scale: 4096.0 | grad norm: 62254.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 216/ 159576 | consumed samples: 3456 | elapsed time per iteration (ms): 13592.8 | learning rate: 9.586E-07 | global batch size: 16 | lm loss: 8.472538E+00 | loss scale: 4096.0 | grad norm: 50530.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 217/ 159576 | consumed samples: 3472 | elapsed time per iteration (ms): 13518.7 | learning rate: 9.630E-07 | global batch size: 16 | lm loss: 8.448650E+00 | loss scale: 4096.0 | grad norm: 80646.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 218/ 159576 | consumed samples: 3488 | elapsed time per iteration (ms): 13661.2 | learning rate: 9.675E-07 | global batch size: 16 | lm loss: 7.734177E+00 | loss scale: 4096.0 | grad norm: 149486.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 219/ 159576 | consumed samples: 3504 | elapsed time per iteration (ms): 14068.7 | learning rate: 9.719E-07 | global batch size: 16 | lm loss: 8.294590E+00 | loss scale: 4096.0 | grad norm: 56571.951 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 220/ 159576 | consumed samples: 3520 | elapsed time per iteration (ms): 13630.3 | learning rate: 9.763E-07 | global batch size: 16 | lm loss: 8.257124E+00 | loss scale: 4096.0 | grad norm: 62046.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 221/ 159576 | consumed samples: 3536 | elapsed time per iteration (ms): 13703.1 | learning rate: 9.808E-07 | global batch size: 16 | lm loss: 8.288898E+00 | loss scale: 4096.0 | grad norm: 59852.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 222/ 159576 | consumed samples: 3552 | elapsed time per iteration (ms): 13772.5 | learning rate: 9.852E-07 | global batch size: 16 | lm loss: 8.155066E+00 | loss scale: 4096.0 | grad norm: 58014.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 223/ 159576 | consumed samples: 3568 | elapsed time per iteration (ms): 13771.9 | learning rate: 9.896E-07 | global batch size: 16 | lm loss: 8.263331E+00 | loss scale: 4096.0 | grad norm: 63268.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 224/ 159576 | consumed samples: 3584 | elapsed time per iteration (ms): 14010.9 | learning rate: 9.941E-07 | global batch size: 16 | lm loss: 8.163802E+00 | loss scale: 4096.0 | grad norm: 57272.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 225/ 159576 | consumed samples: 3600 | elapsed time per iteration (ms): 13593.2 | learning rate: 9.985E-07 | global batch size: 16 | lm loss: 8.163125E+00 | loss scale: 4096.0 | grad norm: 42586.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 226/ 159576 | consumed samples: 3616 | elapsed time per iteration (ms): 13655.1 | learning rate: 1.003E-06 | global batch size: 16 | lm loss: 8.360060E+00 | loss scale: 4096.0 | grad norm: 122218.171 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 227/ 159576 | consumed samples: 3632 | elapsed time per iteration (ms): 13648.6 | learning rate: 1.007E-06 | global batch size: 16 | lm loss: 8.255043E+00 | loss scale: 4096.0 | grad norm: 85521.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 228/ 159576 | consumed samples: 3648 | elapsed time per iteration (ms): 14030.4 | learning rate: 1.012E-06 | global batch size: 16 | lm loss: 8.261985E+00 | loss scale: 4096.0 | grad norm: 67005.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 229/ 159576 | consumed samples: 3664 | elapsed time per iteration (ms): 13712.9 | learning rate: 1.016E-06 | global batch size: 16 | lm loss: 8.186491E+00 | loss scale: 4096.0 | grad norm: 56484.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 230/ 159576 | consumed samples: 3680 | elapsed time per iteration (ms): 13908.9 | learning rate: 1.021E-06 | global batch size: 16 | lm loss: 8.405298E+00 | loss scale: 4096.0 | grad norm: 76846.855 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 231/ 159576 | consumed samples: 3696 | elapsed time per iteration (ms): 13436.7 | learning rate: 1.025E-06 | global batch size: 16 | lm loss: 8.396565E+00 | loss scale: 4096.0 | grad norm: 65903.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 232/ 159576 | consumed samples: 3712 | elapsed time per iteration (ms): 13847.3 | learning rate: 1.030E-06 | global batch size: 16 | lm loss: 8.280029E+00 | loss scale: 4096.0 | grad norm: 49376.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 233/ 159576 | consumed samples: 3728 | elapsed time per iteration (ms): 13817.4 | learning rate: 1.034E-06 | global batch size: 16 | lm loss: 8.356775E+00 | loss scale: 4096.0 | grad norm: 59866.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 234/ 159576 | consumed samples: 3744 | elapsed time per iteration (ms): 13586.3 | learning rate: 1.038E-06 | global batch size: 16 | lm loss: 8.429869E+00 | loss scale: 4096.0 | grad norm: 177436.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 235/ 159576 | consumed samples: 3760 | elapsed time per iteration (ms): 13599.7 | learning rate: 1.043E-06 | global batch size: 16 | lm loss: 8.434436E+00 | loss scale: 4096.0 | grad norm: 135413.910 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 236/ 159576 | consumed samples: 3776 | elapsed time per iteration (ms): 13650.1 | learning rate: 1.047E-06 | global batch size: 16 | lm loss: 8.271558E+00 | loss scale: 4096.0 | grad norm: 90861.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 237/ 159576 | consumed samples: 3792 | elapsed time per iteration (ms): 14163.4 | learning rate: 1.052E-06 | global batch size: 16 | lm loss: 8.303068E+00 | loss scale: 4096.0 | grad norm: 54299.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 238/ 159576 | consumed samples: 3808 | elapsed time per iteration (ms): 13595.2 | learning rate: 1.056E-06 | global batch size: 16 | lm loss: 8.246891E+00 | loss scale: 4096.0 | grad norm: 58398.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 239/ 159576 | consumed samples: 3824 | elapsed time per iteration (ms): 13633.1 | learning rate: 1.061E-06 | global batch size: 16 | lm loss: 8.223282E+00 | loss scale: 4096.0 | grad norm: 58574.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 240/ 159576 | consumed samples: 3840 | elapsed time per iteration (ms): 13623.5 | learning rate: 1.065E-06 | global batch size: 16 | lm loss: 8.408007E+00 | loss scale: 4096.0 | grad norm: 128668.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 241/ 159576 | consumed samples: 3856 | elapsed time per iteration (ms): 14073.7 | learning rate: 1.070E-06 | global batch size: 16 | lm loss: 8.490035E+00 | loss scale: 4096.0 | grad norm: 228763.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 242/ 159576 | consumed samples: 3872 | elapsed time per iteration (ms): 13568.7 | learning rate: 1.074E-06 | global batch size: 16 | lm loss: 8.217072E+00 | loss scale: 4096.0 | grad norm: 54955.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 243/ 159576 | consumed samples: 3888 | elapsed time per iteration (ms): 13649.7 | learning rate: 1.078E-06 | global batch size: 16 | lm loss: 8.280759E+00 | loss scale: 4096.0 | grad norm: 70277.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 244/ 159576 | consumed samples: 3904 | elapsed time per iteration (ms): 13743.3 | learning rate: 1.083E-06 | global batch size: 16 | lm loss: 8.266622E+00 | loss scale: 4096.0 | grad norm: 52088.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 245/ 159576 | consumed samples: 3920 | elapsed time per iteration (ms): 13760.9 | learning rate: 1.087E-06 | global batch size: 16 | lm loss: 8.186391E+00 | loss scale: 4096.0 | grad norm: 45303.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 246/ 159576 | consumed samples: 3936 | elapsed time per iteration (ms): 13869.6 | learning rate: 1.092E-06 | global batch size: 16 | lm loss: 8.217053E+00 | loss scale: 4096.0 | grad norm: 66052.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 247/ 159576 | consumed samples: 3952 | elapsed time per iteration (ms): 13595.0 | learning rate: 1.096E-06 | global batch size: 16 | lm loss: 8.218720E+00 | loss scale: 4096.0 | grad norm: 63154.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 248/ 159576 | consumed samples: 3968 | elapsed time per iteration (ms): 13605.0 | learning rate: 1.101E-06 | global batch size: 16 | lm loss: 8.214328E+00 | loss scale: 4096.0 | grad norm: 54827.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 249/ 159576 | consumed samples: 3984 | elapsed time per iteration (ms): 13572.6 | learning rate: 1.105E-06 | global batch size: 16 | lm loss: 8.289627E+00 | loss scale: 4096.0 | grad norm: 112939.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 250/ 159576 | consumed samples: 4000 | elapsed time per iteration (ms): 13869.8 | learning rate: 1.109E-06 | global batch size: 16 | lm loss: 8.362014E+00 | loss scale: 4096.0 | grad norm: 56746.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 251/ 159576 | consumed samples: 4016 | elapsed time per iteration (ms): 13620.5 | learning rate: 1.114E-06 | global batch size: 16 | lm loss: 8.189938E+00 | loss scale: 4096.0 | grad norm: 56152.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 252/ 159576 | consumed samples: 4032 | elapsed time per iteration (ms): 13708.2 | learning rate: 1.118E-06 | global batch size: 16 | lm loss: 8.356908E+00 | loss scale: 4096.0 | grad norm: 78498.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 253/ 159576 | consumed samples: 4048 | elapsed time per iteration (ms): 13478.4 | learning rate: 1.123E-06 | global batch size: 16 | lm loss: 8.047684E+00 | loss scale: 4096.0 | grad norm: 66252.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 254/ 159576 | consumed samples: 4064 | elapsed time per iteration (ms): 14231.8 | learning rate: 1.127E-06 | global batch size: 16 | lm loss: 8.279363E+00 | loss scale: 4096.0 | grad norm: 85125.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 255/ 159576 | consumed samples: 4080 | elapsed time per iteration (ms): 13522.4 | learning rate: 1.132E-06 | global batch size: 16 | lm loss: 8.159877E+00 | loss scale: 4096.0 | grad norm: 48952.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 256/ 159576 | consumed samples: 4096 | elapsed time per iteration (ms): 13553.5 | learning rate: 1.136E-06 | global batch size: 16 | lm loss: 8.154376E+00 | loss scale: 4096.0 | grad norm: 41715.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 257/ 159576 | consumed samples: 4112 | elapsed time per iteration (ms): 13537.5 | learning rate: 1.141E-06 | global batch size: 16 | lm loss: 8.247561E+00 | loss scale: 4096.0 | grad norm: 57864.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 258/ 159576 | consumed samples: 4128 | elapsed time per iteration (ms): 13659.5 | learning rate: 1.145E-06 | global batch size: 16 | lm loss: 8.167631E+00 | loss scale: 4096.0 | grad norm: 45439.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 259/ 159576 | consumed samples: 4144 | elapsed time per iteration (ms): 14023.4 | learning rate: 1.149E-06 | global batch size: 16 | lm loss: 8.081510E+00 | loss scale: 4096.0 | grad norm: 54108.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 260/ 159576 | consumed samples: 4160 | elapsed time per iteration (ms): 13447.5 | learning rate: 1.154E-06 | global batch size: 16 | lm loss: 8.074065E+00 | loss scale: 4096.0 | grad norm: 45799.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 261/ 159576 | consumed samples: 4176 | elapsed time per iteration (ms): 13604.0 | learning rate: 1.158E-06 | global batch size: 16 | lm loss: 8.134088E+00 | loss scale: 4096.0 | grad norm: 34426.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 262/ 159576 | consumed samples: 4192 | elapsed time per iteration (ms): 13632.5 | learning rate: 1.163E-06 | global batch size: 16 | lm loss: 8.331153E+00 | loss scale: 4096.0 | grad norm: 241742.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 263/ 159576 | consumed samples: 4208 | elapsed time per iteration (ms): 14049.0 | learning rate: 1.167E-06 | global batch size: 16 | lm loss: 8.300336E+00 | loss scale: 4096.0 | grad norm: 89382.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 264/ 159576 | consumed samples: 4224 | elapsed time per iteration (ms): 13554.0 | learning rate: 1.172E-06 | global batch size: 16 | lm loss: 8.285131E+00 | loss scale: 4096.0 | grad norm: 56471.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 265/ 159576 | consumed samples: 4240 | elapsed time per iteration (ms): 13594.4 | learning rate: 1.176E-06 | global batch size: 16 | lm loss: 8.247953E+00 | loss scale: 4096.0 | grad norm: 59934.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 266/ 159576 | consumed samples: 4256 | elapsed time per iteration (ms): 13722.5 | learning rate: 1.180E-06 | global batch size: 16 | lm loss: 8.086367E+00 | loss scale: 4096.0 | grad norm: 49794.894 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 267/ 159576 | consumed samples: 4272 | elapsed time per iteration (ms): 13925.6 | learning rate: 1.185E-06 | global batch size: 16 | lm loss: 8.364625E+00 | loss scale: 4096.0 | grad norm: 198667.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 268/ 159576 | consumed samples: 4288 | elapsed time per iteration (ms): 13685.9 | learning rate: 1.189E-06 | global batch size: 16 | lm loss: 8.378025E+00 | loss scale: 4096.0 | grad norm: 206726.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 269/ 159576 | consumed samples: 4304 | elapsed time per iteration (ms): 13784.2 | learning rate: 1.194E-06 | global batch size: 16 | lm loss: 8.309950E+00 | loss scale: 4096.0 | grad norm: 102692.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 270/ 159576 | consumed samples: 4320 | elapsed time per iteration (ms): 13426.6 | learning rate: 1.198E-06 | global batch size: 16 | lm loss: 8.437682E+00 | loss scale: 4096.0 | grad norm: 53779.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 271/ 159576 | consumed samples: 4336 | elapsed time per iteration (ms): 13590.5 | learning rate: 1.203E-06 | global batch size: 16 | lm loss: 8.180303E+00 | loss scale: 4096.0 | grad norm: 41837.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 272/ 159576 | consumed samples: 4352 | elapsed time per iteration (ms): 13918.1 | learning rate: 1.207E-06 | global batch size: 16 | lm loss: 8.269817E+00 | loss scale: 4096.0 | grad norm: 60250.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 273/ 159576 | consumed samples: 4368 | elapsed time per iteration (ms): 13764.9 | learning rate: 1.212E-06 | global batch size: 16 | lm loss: 8.196259E+00 | loss scale: 4096.0 | grad norm: 51310.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 274/ 159576 | consumed samples: 4384 | elapsed time per iteration (ms): 13543.7 | learning rate: 1.216E-06 | global batch size: 16 | lm loss: 8.111527E+00 | loss scale: 4096.0 | grad norm: 62869.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 275/ 159576 | consumed samples: 4400 | elapsed time per iteration (ms): 13741.6 | learning rate: 1.220E-06 | global batch size: 16 | lm loss: 8.196915E+00 | loss scale: 4096.0 | grad norm: 56382.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 276/ 159576 | consumed samples: 4416 | elapsed time per iteration (ms): 14418.6 | learning rate: 1.225E-06 | global batch size: 16 | lm loss: 8.163618E+00 | loss scale: 4096.0 | grad norm: 59897.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 277/ 159576 | consumed samples: 4432 | elapsed time per iteration (ms): 13488.6 | learning rate: 1.229E-06 | global batch size: 16 | lm loss: 8.232466E+00 | loss scale: 4096.0 | grad norm: 106883.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 278/ 159576 | consumed samples: 4448 | elapsed time per iteration (ms): 13680.7 | learning rate: 1.234E-06 | global batch size: 16 | lm loss: 8.285415E+00 | loss scale: 4096.0 | grad norm: 52155.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 279/ 159576 | consumed samples: 4464 | elapsed time per iteration (ms): 13663.3 | learning rate: 1.238E-06 | global batch size: 16 | lm loss: 8.221471E+00 | loss scale: 4096.0 | grad norm: 43151.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 280/ 159576 | consumed samples: 4480 | elapsed time per iteration (ms): 13783.3 | learning rate: 1.243E-06 | global batch size: 16 | lm loss: 7.827011E+00 | loss scale: 4096.0 | grad norm: 60081.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 281/ 159576 | consumed samples: 4496 | elapsed time per iteration (ms): 13993.1 | learning rate: 1.247E-06 | global batch size: 16 | lm loss: 8.016405E+00 | loss scale: 4096.0 | grad norm: 60969.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 282/ 159576 | consumed samples: 4512 | elapsed time per iteration (ms): 13747.2 | learning rate: 1.251E-06 | global batch size: 16 | lm loss: 8.205744E+00 | loss scale: 4096.0 | grad norm: 64657.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 283/ 159576 | consumed samples: 4528 | elapsed time per iteration (ms): 13732.1 | learning rate: 1.256E-06 | global batch size: 16 | lm loss: 8.225381E+00 | loss scale: 4096.0 | grad norm: 46007.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 284/ 159576 | consumed samples: 4544 | elapsed time per iteration (ms): 13701.8 | learning rate: 1.260E-06 | global batch size: 16 | lm loss: 8.069484E+00 | loss scale: 4096.0 | grad norm: 50539.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 285/ 159576 | consumed samples: 4560 | elapsed time per iteration (ms): 13774.1 | learning rate: 1.265E-06 | global batch size: 16 | lm loss: 8.313256E+00 | loss scale: 4096.0 | grad norm: 75301.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 286/ 159576 | consumed samples: 4576 | elapsed time per iteration (ms): 13700.1 | learning rate: 1.269E-06 | global batch size: 16 | lm loss: 8.296308E+00 | loss scale: 4096.0 | grad norm: 109402.142 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 287/ 159576 | consumed samples: 4592 | elapsed time per iteration (ms): 13678.1 | learning rate: 1.274E-06 | global batch size: 16 | lm loss: 8.245502E+00 | loss scale: 4096.0 | grad norm: 53639.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 288/ 159576 | consumed samples: 4608 | elapsed time per iteration (ms): 13698.6 | learning rate: 1.278E-06 | global batch size: 16 | lm loss: 8.137961E+00 | loss scale: 4096.0 | grad norm: 42750.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 289/ 159576 | consumed samples: 4624 | elapsed time per iteration (ms): 14172.7 | learning rate: 1.283E-06 | global batch size: 16 | lm loss: 8.187901E+00 | loss scale: 4096.0 | grad norm: 108265.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 290/ 159576 | consumed samples: 4640 | elapsed time per iteration (ms): 13663.7 | learning rate: 1.287E-06 | global batch size: 16 | lm loss: 8.092007E+00 | loss scale: 4096.0 | grad norm: 61613.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 291/ 159576 | consumed samples: 4656 | elapsed time per iteration (ms): 13802.2 | learning rate: 1.291E-06 | global batch size: 16 | lm loss: 8.140871E+00 | loss scale: 4096.0 | grad norm: 73138.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 292/ 159576 | consumed samples: 4672 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.296E-06 | global batch size: 16 | lm loss: 8.096482E+00 | loss scale: 4096.0 | grad norm: 56947.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 293/ 159576 | consumed samples: 4688 | elapsed time per iteration (ms): 13692.3 | learning rate: 1.300E-06 | global batch size: 16 | lm loss: 8.261303E+00 | loss scale: 4096.0 | grad norm: 50306.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 294/ 159576 | consumed samples: 4704 | elapsed time per iteration (ms): 13953.1 | learning rate: 1.305E-06 | global batch size: 16 | lm loss: 8.088846E+00 | loss scale: 4096.0 | grad norm: 70651.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 295/ 159576 | consumed samples: 4720 | elapsed time per iteration (ms): 13681.7 | learning rate: 1.309E-06 | global batch size: 16 | lm loss: 8.216883E+00 | loss scale: 4096.0 | grad norm: 109748.850 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 296/ 159576 | consumed samples: 4736 | elapsed time per iteration (ms): 13680.1 | learning rate: 1.314E-06 | global batch size: 16 | lm loss: 8.011025E+00 | loss scale: 4096.0 | grad norm: 57863.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 297/ 159576 | consumed samples: 4752 | elapsed time per iteration (ms): 13766.7 | learning rate: 1.318E-06 | global batch size: 16 | lm loss: 8.023094E+00 | loss scale: 4096.0 | grad norm: 39732.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 298/ 159576 | consumed samples: 4768 | elapsed time per iteration (ms): 14056.0 | learning rate: 1.322E-06 | global batch size: 16 | lm loss: 8.085699E+00 | loss scale: 4096.0 | grad norm: 93534.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 299/ 159576 | consumed samples: 4784 | elapsed time per iteration (ms): 13507.1 | learning rate: 1.327E-06 | global batch size: 16 | lm loss: 8.410425E+00 | loss scale: 4096.0 | grad norm: 42550.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 300/ 159576 | consumed samples: 4800 | elapsed time per iteration (ms): 13670.9 | learning rate: 1.331E-06 | global batch size: 16 | lm loss: 8.125405E+00 | loss scale: 4096.0 | grad norm: 37244.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 301/ 159576 | consumed samples: 4816 | elapsed time per iteration (ms): 13643.0 | learning rate: 1.336E-06 | global batch size: 16 | lm loss: 7.945562E+00 | loss scale: 4096.0 | grad norm: 37921.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 302/ 159576 | consumed samples: 4832 | elapsed time per iteration (ms): 14097.2 | learning rate: 1.340E-06 | global batch size: 16 | lm loss: 8.073545E+00 | loss scale: 4096.0 | grad norm: 80879.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 303/ 159576 | consumed samples: 4848 | elapsed time per iteration (ms): 13625.2 | learning rate: 1.345E-06 | global batch size: 16 | lm loss: 8.224352E+00 | loss scale: 4096.0 | grad norm: 75920.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 304/ 159576 | consumed samples: 4864 | elapsed time per iteration (ms): 13709.0 | learning rate: 1.349E-06 | global batch size: 16 | lm loss: 8.025059E+00 | loss scale: 4096.0 | grad norm: 39535.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 305/ 159576 | consumed samples: 4880 | elapsed time per iteration (ms): 13741.5 | learning rate: 1.354E-06 | global batch size: 16 | lm loss: 8.094482E+00 | loss scale: 4096.0 | grad norm: 40630.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 306/ 159576 | consumed samples: 4896 | elapsed time per iteration (ms): 13523.7 | learning rate: 1.358E-06 | global batch size: 16 | lm loss: 8.135887E+00 | loss scale: 4096.0 | grad norm: 80825.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 307/ 159576 | consumed samples: 4912 | elapsed time per iteration (ms): 14093.4 | learning rate: 1.362E-06 | global batch size: 16 | lm loss: 8.292034E+00 | loss scale: 4096.0 | grad norm: 86171.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 308/ 159576 | consumed samples: 4928 | elapsed time per iteration (ms): 13647.9 | learning rate: 1.367E-06 | global batch size: 16 | lm loss: 8.204563E+00 | loss scale: 4096.0 | grad norm: 46698.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 309/ 159576 | consumed samples: 4944 | elapsed time per iteration (ms): 13637.2 | learning rate: 1.371E-06 | global batch size: 16 | lm loss: 8.033182E+00 | loss scale: 4096.0 | grad norm: 42089.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 310/ 159576 | consumed samples: 4960 | elapsed time per iteration (ms): 13700.6 | learning rate: 1.376E-06 | global batch size: 16 | lm loss: 8.048797E+00 | loss scale: 4096.0 | grad norm: 56022.805 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 311/ 159576 | consumed samples: 4976 | elapsed time per iteration (ms): 14085.5 | learning rate: 1.380E-06 | global batch size: 16 | lm loss: 7.623003E+00 | loss scale: 4096.0 | grad norm: 72171.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 312/ 159576 | consumed samples: 4992 | elapsed time per iteration (ms): 13830.9 | learning rate: 1.385E-06 | global batch size: 16 | lm loss: 8.082812E+00 | loss scale: 4096.0 | grad norm: 39681.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 313/ 159576 | consumed samples: 5008 | elapsed time per iteration (ms): 13533.9 | learning rate: 1.389E-06 | global batch size: 16 | lm loss: 8.116117E+00 | loss scale: 4096.0 | grad norm: 33726.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 314/ 159576 | consumed samples: 5024 | elapsed time per iteration (ms): 13637.3 | learning rate: 1.393E-06 | global batch size: 16 | lm loss: 8.210217E+00 | loss scale: 4096.0 | grad norm: 89402.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 315/ 159576 | consumed samples: 5040 | elapsed time per iteration (ms): 14136.6 | learning rate: 1.398E-06 | global batch size: 16 | lm loss: 7.798199E+00 | loss scale: 4096.0 | grad norm: 83566.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 316/ 159576 | consumed samples: 5056 | elapsed time per iteration (ms): 13651.3 | learning rate: 1.402E-06 | global batch size: 16 | lm loss: 8.066372E+00 | loss scale: 4096.0 | grad norm: 38768.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 317/ 159576 | consumed samples: 5072 | elapsed time per iteration (ms): 13641.7 | learning rate: 1.407E-06 | global batch size: 16 | lm loss: 7.876265E+00 | loss scale: 4096.0 | grad norm: 36174.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 318/ 159576 | consumed samples: 5088 | elapsed time per iteration (ms): 13653.8 | learning rate: 1.411E-06 | global batch size: 16 | lm loss: 7.979768E+00 | loss scale: 4096.0 | grad norm: 66651.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 319/ 159576 | consumed samples: 5104 | elapsed time per iteration (ms): 13755.9 | learning rate: 1.416E-06 | global batch size: 16 | lm loss: 8.094232E+00 | loss scale: 4096.0 | grad norm: 79088.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 320/ 159576 | consumed samples: 5120 | elapsed time per iteration (ms): 13900.8 | learning rate: 1.420E-06 | global batch size: 16 | lm loss: 8.113304E+00 | loss scale: 4096.0 | grad norm: 52331.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 321/ 159576 | consumed samples: 5136 | elapsed time per iteration (ms): 13649.9 | learning rate: 1.425E-06 | global batch size: 16 | lm loss: 8.128990E+00 | loss scale: 4096.0 | grad norm: 46927.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 322/ 159576 | consumed samples: 5152 | elapsed time per iteration (ms): 13693.6 | learning rate: 1.429E-06 | global batch size: 16 | lm loss: 8.486778E+00 | loss scale: 4096.0 | grad norm: 89462.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 323/ 159576 | consumed samples: 5168 | elapsed time per iteration (ms): 13699.8 | learning rate: 1.433E-06 | global batch size: 16 | lm loss: 8.051263E+00 | loss scale: 4096.0 | grad norm: 42680.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 324/ 159576 | consumed samples: 5184 | elapsed time per iteration (ms): 14041.8 | learning rate: 1.438E-06 | global batch size: 16 | lm loss: 8.181097E+00 | loss scale: 4096.0 | grad norm: 43801.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 325/ 159576 | consumed samples: 5200 | elapsed time per iteration (ms): 13711.0 | learning rate: 1.442E-06 | global batch size: 16 | lm loss: 8.171723E+00 | loss scale: 4096.0 | grad norm: 47748.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 326/ 159576 | consumed samples: 5216 | elapsed time per iteration (ms): 13743.3 | learning rate: 1.447E-06 | global batch size: 16 | lm loss: 8.035454E+00 | loss scale: 4096.0 | grad norm: 58353.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 327/ 159576 | consumed samples: 5232 | elapsed time per iteration (ms): 13602.7 | learning rate: 1.451E-06 | global batch size: 16 | lm loss: 8.021453E+00 | loss scale: 4096.0 | grad norm: 44165.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 328/ 159576 | consumed samples: 5248 | elapsed time per iteration (ms): 13748.9 | learning rate: 1.456E-06 | global batch size: 16 | lm loss: 8.051726E+00 | loss scale: 4096.0 | grad norm: 35138.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 329/ 159576 | consumed samples: 5264 | elapsed time per iteration (ms): 13961.7 | learning rate: 1.460E-06 | global batch size: 16 | lm loss: 7.960547E+00 | loss scale: 4096.0 | grad norm: 41197.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 330/ 159576 | consumed samples: 5280 | elapsed time per iteration (ms): 13633.4 | learning rate: 1.464E-06 | global batch size: 16 | lm loss: 8.084079E+00 | loss scale: 4096.0 | grad norm: 43199.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 331/ 159576 | consumed samples: 5296 | elapsed time per iteration (ms): 13678.9 | learning rate: 1.469E-06 | global batch size: 16 | lm loss: 8.243130E+00 | loss scale: 4096.0 | grad norm: 39935.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 332/ 159576 | consumed samples: 5312 | elapsed time per iteration (ms): 13653.3 | learning rate: 1.473E-06 | global batch size: 16 | lm loss: 8.148146E+00 | loss scale: 4096.0 | grad norm: 31710.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 333/ 159576 | consumed samples: 5328 | elapsed time per iteration (ms): 13982.9 | learning rate: 1.478E-06 | global batch size: 16 | lm loss: 8.055049E+00 | loss scale: 4096.0 | grad norm: 40555.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 334/ 159576 | consumed samples: 5344 | elapsed time per iteration (ms): 13576.5 | learning rate: 1.482E-06 | global batch size: 16 | lm loss: 8.154724E+00 | loss scale: 4096.0 | grad norm: 98189.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 335/ 159576 | consumed samples: 5360 | elapsed time per iteration (ms): 13666.3 | learning rate: 1.487E-06 | global batch size: 16 | lm loss: 8.056485E+00 | loss scale: 4096.0 | grad norm: 53277.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 336/ 159576 | consumed samples: 5376 | elapsed time per iteration (ms): 13667.7 | learning rate: 1.491E-06 | global batch size: 16 | lm loss: 7.902112E+00 | loss scale: 4096.0 | grad norm: 35520.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 337/ 159576 | consumed samples: 5392 | elapsed time per iteration (ms): 14189.1 | learning rate: 1.496E-06 | global batch size: 16 | lm loss: 8.211933E+00 | loss scale: 4096.0 | grad norm: 102636.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 338/ 159576 | consumed samples: 5408 | elapsed time per iteration (ms): 13538.3 | learning rate: 1.500E-06 | global batch size: 16 | lm loss: 8.077993E+00 | loss scale: 4096.0 | grad norm: 74161.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 339/ 159576 | consumed samples: 5424 | elapsed time per iteration (ms): 13690.1 | learning rate: 1.504E-06 | global batch size: 16 | lm loss: 8.002722E+00 | loss scale: 4096.0 | grad norm: 41178.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 340/ 159576 | consumed samples: 5440 | elapsed time per iteration (ms): 13761.4 | learning rate: 1.509E-06 | global batch size: 16 | lm loss: 8.070647E+00 | loss scale: 4096.0 | grad norm: 146660.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 341/ 159576 | consumed samples: 5456 | elapsed time per iteration (ms): 13679.6 | learning rate: 1.513E-06 | global batch size: 16 | lm loss: 8.211810E+00 | loss scale: 4096.0 | grad norm: 56011.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 342/ 159576 | consumed samples: 5472 | elapsed time per iteration (ms): 13958.7 | learning rate: 1.518E-06 | global batch size: 16 | lm loss: 8.028828E+00 | loss scale: 4096.0 | grad norm: 45507.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 343/ 159576 | consumed samples: 5488 | elapsed time per iteration (ms): 13796.1 | learning rate: 1.522E-06 | global batch size: 16 | lm loss: 8.000618E+00 | loss scale: 4096.0 | grad norm: 41366.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 344/ 159576 | consumed samples: 5504 | elapsed time per iteration (ms): 13566.5 | learning rate: 1.527E-06 | global batch size: 16 | lm loss: 8.106353E+00 | loss scale: 4096.0 | grad norm: 86487.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 345/ 159576 | consumed samples: 5520 | elapsed time per iteration (ms): 13617.7 | learning rate: 1.531E-06 | global batch size: 16 | lm loss: 8.130958E+00 | loss scale: 4096.0 | grad norm: 65559.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 346/ 159576 | consumed samples: 5536 | elapsed time per iteration (ms): 14006.3 | learning rate: 1.536E-06 | global batch size: 16 | lm loss: 8.100373E+00 | loss scale: 4096.0 | grad norm: 50918.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 347/ 159576 | consumed samples: 5552 | elapsed time per iteration (ms): 13652.0 | learning rate: 1.540E-06 | global batch size: 16 | lm loss: 8.193462E+00 | loss scale: 4096.0 | grad norm: 49482.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 348/ 159576 | consumed samples: 5568 | elapsed time per iteration (ms): 13785.4 | learning rate: 1.544E-06 | global batch size: 16 | lm loss: 8.185720E+00 | loss scale: 4096.0 | grad norm: 33616.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 349/ 159576 | consumed samples: 5584 | elapsed time per iteration (ms): 13534.7 | learning rate: 1.549E-06 | global batch size: 16 | lm loss: 7.997324E+00 | loss scale: 4096.0 | grad norm: 41224.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 350/ 159576 | consumed samples: 5600 | elapsed time per iteration (ms): 14148.0 | learning rate: 1.553E-06 | global batch size: 16 | lm loss: 8.069170E+00 | loss scale: 4096.0 | grad norm: 61139.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 351/ 159576 | consumed samples: 5616 | elapsed time per iteration (ms): 13626.0 | learning rate: 1.558E-06 | global batch size: 16 | lm loss: 8.052499E+00 | loss scale: 4096.0 | grad norm: 58965.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 352/ 159576 | consumed samples: 5632 | elapsed time per iteration (ms): 13633.5 | learning rate: 1.562E-06 | global batch size: 16 | lm loss: 8.036291E+00 | loss scale: 4096.0 | grad norm: 38820.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 353/ 159576 | consumed samples: 5648 | elapsed time per iteration (ms): 13648.6 | learning rate: 1.567E-06 | global batch size: 16 | lm loss: 8.007360E+00 | loss scale: 4096.0 | grad norm: 33342.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 354/ 159576 | consumed samples: 5664 | elapsed time per iteration (ms): 13707.0 | learning rate: 1.571E-06 | global batch size: 16 | lm loss: 7.890161E+00 | loss scale: 4096.0 | grad norm: 62589.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 355/ 159576 | consumed samples: 5680 | elapsed time per iteration (ms): 14101.4 | learning rate: 1.575E-06 | global batch size: 16 | lm loss: 8.034273E+00 | loss scale: 4096.0 | grad norm: 62100.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 356/ 159576 | consumed samples: 5696 | elapsed time per iteration (ms): 13548.4 | learning rate: 1.580E-06 | global batch size: 16 | lm loss: 7.964279E+00 | loss scale: 4096.0 | grad norm: 37283.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 357/ 159576 | consumed samples: 5712 | elapsed time per iteration (ms): 13655.3 | learning rate: 1.584E-06 | global batch size: 16 | lm loss: 7.882459E+00 | loss scale: 4096.0 | grad norm: 36278.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 358/ 159576 | consumed samples: 5728 | elapsed time per iteration (ms): 13872.1 | learning rate: 1.589E-06 | global batch size: 16 | lm loss: 8.081428E+00 | loss scale: 4096.0 | grad norm: 59624.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 359/ 159576 | consumed samples: 5744 | elapsed time per iteration (ms): 13830.3 | learning rate: 1.593E-06 | global batch size: 16 | lm loss: 8.345490E+00 | loss scale: 4096.0 | grad norm: 101818.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 360/ 159576 | consumed samples: 5760 | elapsed time per iteration (ms): 13738.3 | learning rate: 1.598E-06 | global batch size: 16 | lm loss: 8.090802E+00 | loss scale: 4096.0 | grad norm: 37735.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 361/ 159576 | consumed samples: 5776 | elapsed time per iteration (ms): 13673.7 | learning rate: 1.602E-06 | global batch size: 16 | lm loss: 7.934822E+00 | loss scale: 4096.0 | grad norm: 35051.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 362/ 159576 | consumed samples: 5792 | elapsed time per iteration (ms): 13779.0 | learning rate: 1.607E-06 | global batch size: 16 | lm loss: 8.217977E+00 | loss scale: 4096.0 | grad norm: 81671.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 363/ 159576 | consumed samples: 5808 | elapsed time per iteration (ms): 14148.6 | learning rate: 1.611E-06 | global batch size: 16 | lm loss: 7.956856E+00 | loss scale: 4096.0 | grad norm: 123728.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 364/ 159576 | consumed samples: 5824 | elapsed time per iteration (ms): 13509.6 | learning rate: 1.615E-06 | global batch size: 16 | lm loss: 7.980748E+00 | loss scale: 4096.0 | grad norm: 64323.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 365/ 159576 | consumed samples: 5840 | elapsed time per iteration (ms): 13791.1 | learning rate: 1.620E-06 | global batch size: 16 | lm loss: 7.927495E+00 | loss scale: 4096.0 | grad norm: 38595.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 366/ 159576 | consumed samples: 5856 | elapsed time per iteration (ms): 13535.8 | learning rate: 1.624E-06 | global batch size: 16 | lm loss: 7.992770E+00 | loss scale: 4096.0 | grad norm: 34786.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 367/ 159576 | consumed samples: 5872 | elapsed time per iteration (ms): 13709.6 | learning rate: 1.629E-06 | global batch size: 16 | lm loss: 8.033854E+00 | loss scale: 4096.0 | grad norm: 26681.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 368/ 159576 | consumed samples: 5888 | elapsed time per iteration (ms): 13923.8 | learning rate: 1.633E-06 | global batch size: 16 | lm loss: 8.086361E+00 | loss scale: 4096.0 | grad norm: 116063.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 369/ 159576 | consumed samples: 5904 | elapsed time per iteration (ms): 13743.2 | learning rate: 1.638E-06 | global batch size: 16 | lm loss: 8.136069E+00 | loss scale: 4096.0 | grad norm: 192843.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 370/ 159576 | consumed samples: 5920 | elapsed time per iteration (ms): 13586.5 | learning rate: 1.642E-06 | global batch size: 16 | lm loss: 8.213842E+00 | loss scale: 4096.0 | grad norm: 66749.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 371/ 159576 | consumed samples: 5936 | elapsed time per iteration (ms): 13637.5 | learning rate: 1.646E-06 | global batch size: 16 | lm loss: 7.862526E+00 | loss scale: 4096.0 | grad norm: 35628.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 372/ 159576 | consumed samples: 5952 | elapsed time per iteration (ms): 14269.3 | learning rate: 1.651E-06 | global batch size: 16 | lm loss: 8.111351E+00 | loss scale: 4096.0 | grad norm: 51284.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 373/ 159576 | consumed samples: 5968 | elapsed time per iteration (ms): 13424.8 | learning rate: 1.655E-06 | global batch size: 16 | lm loss: 7.860275E+00 | loss scale: 4096.0 | grad norm: 51885.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 374/ 159576 | consumed samples: 5984 | elapsed time per iteration (ms): 13638.9 | learning rate: 1.660E-06 | global batch size: 16 | lm loss: 7.995843E+00 | loss scale: 4096.0 | grad norm: 40982.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 375/ 159576 | consumed samples: 6000 | elapsed time per iteration (ms): 13719.8 | learning rate: 1.664E-06 | global batch size: 16 | lm loss: 7.989121E+00 | loss scale: 4096.0 | grad norm: 43694.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 376/ 159576 | consumed samples: 6016 | elapsed time per iteration (ms): 13718.2 | learning rate: 1.669E-06 | global batch size: 16 | lm loss: 8.054690E+00 | loss scale: 4096.0 | grad norm: 56142.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 377/ 159576 | consumed samples: 6032 | elapsed time per iteration (ms): 14087.0 | learning rate: 1.673E-06 | global batch size: 16 | lm loss: 8.145277E+00 | loss scale: 4096.0 | grad norm: 77837.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 378/ 159576 | consumed samples: 6048 | elapsed time per iteration (ms): 13621.7 | learning rate: 1.678E-06 | global batch size: 16 | lm loss: 7.879861E+00 | loss scale: 4096.0 | grad norm: 35054.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 379/ 159576 | consumed samples: 6064 | elapsed time per iteration (ms): 13676.7 | learning rate: 1.682E-06 | global batch size: 16 | lm loss: 7.996103E+00 | loss scale: 4096.0 | grad norm: 31871.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 380/ 159576 | consumed samples: 6080 | elapsed time per iteration (ms): 13756.2 | learning rate: 1.686E-06 | global batch size: 16 | lm loss: 7.788074E+00 | loss scale: 4096.0 | grad norm: 30378.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 381/ 159576 | consumed samples: 6096 | elapsed time per iteration (ms): 13731.7 | learning rate: 1.691E-06 | global batch size: 16 | lm loss: 7.998044E+00 | loss scale: 4096.0 | grad norm: 78167.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 382/ 159576 | consumed samples: 6112 | elapsed time per iteration (ms): 13696.8 | learning rate: 1.695E-06 | global batch size: 16 | lm loss: 8.001510E+00 | loss scale: 4096.0 | grad norm: 57981.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 383/ 159576 | consumed samples: 6128 | elapsed time per iteration (ms): 13688.0 | learning rate: 1.700E-06 | global batch size: 16 | lm loss: 8.043833E+00 | loss scale: 4096.0 | grad norm: 40631.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 384/ 159576 | consumed samples: 6144 | elapsed time per iteration (ms): 13680.4 | learning rate: 1.704E-06 | global batch size: 16 | lm loss: 8.029270E+00 | loss scale: 4096.0 | grad norm: 31579.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 385/ 159576 | consumed samples: 6160 | elapsed time per iteration (ms): 14057.5 | learning rate: 1.709E-06 | global batch size: 16 | lm loss: 8.156369E+00 | loss scale: 4096.0 | grad norm: 87842.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 386/ 159576 | consumed samples: 6176 | elapsed time per iteration (ms): 13765.1 | learning rate: 1.713E-06 | global batch size: 16 | lm loss: 8.024692E+00 | loss scale: 4096.0 | grad norm: 56881.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 387/ 159576 | consumed samples: 6192 | elapsed time per iteration (ms): 13768.8 | learning rate: 1.717E-06 | global batch size: 16 | lm loss: 7.997876E+00 | loss scale: 4096.0 | grad norm: 31105.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 388/ 159576 | consumed samples: 6208 | elapsed time per iteration (ms): 13433.5 | learning rate: 1.722E-06 | global batch size: 16 | lm loss: 7.985063E+00 | loss scale: 4096.0 | grad norm: 78090.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 389/ 159576 | consumed samples: 6224 | elapsed time per iteration (ms): 13675.2 | learning rate: 1.726E-06 | global batch size: 16 | lm loss: 7.926050E+00 | loss scale: 4096.0 | grad norm: 61534.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 390/ 159576 | consumed samples: 6240 | elapsed time per iteration (ms): 13989.4 | learning rate: 1.731E-06 | global batch size: 16 | lm loss: 7.938218E+00 | loss scale: 4096.0 | grad norm: 37749.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 391/ 159576 | consumed samples: 6256 | elapsed time per iteration (ms): 13663.4 | learning rate: 1.735E-06 | global batch size: 16 | lm loss: 7.835842E+00 | loss scale: 4096.0 | grad norm: 48700.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 392/ 159576 | consumed samples: 6272 | elapsed time per iteration (ms): 13682.5 | learning rate: 1.740E-06 | global batch size: 16 | lm loss: 7.976984E+00 | loss scale: 4096.0 | grad norm: 45273.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 393/ 159576 | consumed samples: 6288 | elapsed time per iteration (ms): 13680.3 | learning rate: 1.744E-06 | global batch size: 16 | lm loss: 8.063533E+00 | loss scale: 4096.0 | grad norm: 62966.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 394/ 159576 | consumed samples: 6304 | elapsed time per iteration (ms): 14158.6 | learning rate: 1.749E-06 | global batch size: 16 | lm loss: 7.962408E+00 | loss scale: 4096.0 | grad norm: 38917.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 395/ 159576 | consumed samples: 6320 | elapsed time per iteration (ms): 13412.3 | learning rate: 1.753E-06 | global batch size: 16 | lm loss: 7.930057E+00 | loss scale: 4096.0 | grad norm: 59046.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 396/ 159576 | consumed samples: 6336 | elapsed time per iteration (ms): 13631.9 | learning rate: 1.757E-06 | global batch size: 16 | lm loss: 8.137497E+00 | loss scale: 4096.0 | grad norm: 51299.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 397/ 159576 | consumed samples: 6352 | elapsed time per iteration (ms): 13706.0 | learning rate: 1.762E-06 | global batch size: 16 | lm loss: 8.020626E+00 | loss scale: 4096.0 | grad norm: 37056.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 398/ 159576 | consumed samples: 6368 | elapsed time per iteration (ms): 14158.0 | learning rate: 1.766E-06 | global batch size: 16 | lm loss: 8.114269E+00 | loss scale: 4096.0 | grad norm: 64105.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 399/ 159576 | consumed samples: 6384 | elapsed time per iteration (ms): 13628.9 | learning rate: 1.771E-06 | global batch size: 16 | lm loss: 8.186448E+00 | loss scale: 4096.0 | grad norm: 55633.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 400/ 159576 | consumed samples: 6400 | elapsed time per iteration (ms): 13727.5 | learning rate: 1.775E-06 | global batch size: 16 | lm loss: 8.182411E+00 | loss scale: 4096.0 | grad norm: 51312.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 401/ 159576 | consumed samples: 6416 | elapsed time per iteration (ms): 13749.7 | learning rate: 1.780E-06 | global batch size: 16 | lm loss: 8.020710E+00 | loss scale: 4096.0 | grad norm: 32983.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 402/ 159576 | consumed samples: 6432 | elapsed time per iteration (ms): 13473.4 | learning rate: 1.784E-06 | global batch size: 16 | lm loss: 7.970335E+00 | loss scale: 4096.0 | grad norm: 70699.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 403/ 159576 | consumed samples: 6448 | elapsed time per iteration (ms): 13904.7 | learning rate: 1.788E-06 | global batch size: 16 | lm loss: 7.993033E+00 | loss scale: 4096.0 | grad norm: 67107.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 404/ 159576 | consumed samples: 6464 | elapsed time per iteration (ms): 13683.9 | learning rate: 1.793E-06 | global batch size: 16 | lm loss: 8.091874E+00 | loss scale: 4096.0 | grad norm: 26716.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 405/ 159576 | consumed samples: 6480 | elapsed time per iteration (ms): 13642.3 | learning rate: 1.797E-06 | global batch size: 16 | lm loss: 8.088682E+00 | loss scale: 4096.0 | grad norm: 74507.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 406/ 159576 | consumed samples: 6496 | elapsed time per iteration (ms): 13688.7 | learning rate: 1.802E-06 | global batch size: 16 | lm loss: 8.134460E+00 | loss scale: 4096.0 | grad norm: 64155.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 407/ 159576 | consumed samples: 6512 | elapsed time per iteration (ms): 14175.7 | learning rate: 1.806E-06 | global batch size: 16 | lm loss: 8.105555E+00 | loss scale: 4096.0 | grad norm: 39464.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 408/ 159576 | consumed samples: 6528 | elapsed time per iteration (ms): 13703.7 | learning rate: 1.811E-06 | global batch size: 16 | lm loss: 7.988219E+00 | loss scale: 4096.0 | grad norm: 39779.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 409/ 159576 | consumed samples: 6544 | elapsed time per iteration (ms): 13499.5 | learning rate: 1.815E-06 | global batch size: 16 | lm loss: 7.931721E+00 | loss scale: 4096.0 | grad norm: 46421.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 410/ 159576 | consumed samples: 6560 | elapsed time per iteration (ms): 13608.5 | learning rate: 1.820E-06 | global batch size: 16 | lm loss: 7.944845E+00 | loss scale: 4096.0 | grad norm: 28537.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 411/ 159576 | consumed samples: 6576 | elapsed time per iteration (ms): 14088.6 | learning rate: 1.824E-06 | global batch size: 16 | lm loss: 7.955441E+00 | loss scale: 4096.0 | grad norm: 68818.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 412/ 159576 | consumed samples: 6592 | elapsed time per iteration (ms): 13613.5 | learning rate: 1.828E-06 | global batch size: 16 | lm loss: 8.293702E+00 | loss scale: 4096.0 | grad norm: 73315.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 413/ 159576 | consumed samples: 6608 | elapsed time per iteration (ms): 13670.1 | learning rate: 1.833E-06 | global batch size: 16 | lm loss: 7.982622E+00 | loss scale: 4096.0 | grad norm: 40882.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 414/ 159576 | consumed samples: 6624 | elapsed time per iteration (ms): 13753.2 | learning rate: 1.837E-06 | global batch size: 16 | lm loss: 7.981937E+00 | loss scale: 4096.0 | grad norm: 34929.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 415/ 159576 | consumed samples: 6640 | elapsed time per iteration (ms): 13749.7 | learning rate: 1.842E-06 | global batch size: 16 | lm loss: 8.060836E+00 | loss scale: 4096.0 | grad norm: 47572.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 416/ 159576 | consumed samples: 6656 | elapsed time per iteration (ms): 13758.6 | learning rate: 1.846E-06 | global batch size: 16 | lm loss: 8.002974E+00 | loss scale: 4096.0 | grad norm: 37872.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 417/ 159576 | consumed samples: 6672 | elapsed time per iteration (ms): 13599.2 | learning rate: 1.851E-06 | global batch size: 16 | lm loss: 7.972270E+00 | loss scale: 4096.0 | grad norm: 44233.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 418/ 159576 | consumed samples: 6688 | elapsed time per iteration (ms): 13571.0 | learning rate: 1.855E-06 | global batch size: 16 | lm loss: 8.249717E+00 | loss scale: 4096.0 | grad norm: 60770.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 419/ 159576 | consumed samples: 6704 | elapsed time per iteration (ms): 13598.5 | learning rate: 1.859E-06 | global batch size: 16 | lm loss: 7.861569E+00 | loss scale: 4096.0 | grad norm: 31277.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 420/ 159576 | consumed samples: 6720 | elapsed time per iteration (ms): 14077.1 | learning rate: 1.864E-06 | global batch size: 16 | lm loss: 7.965170E+00 | loss scale: 4096.0 | grad norm: 72793.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 421/ 159576 | consumed samples: 6736 | elapsed time per iteration (ms): 13383.0 | learning rate: 1.868E-06 | global batch size: 16 | lm loss: 7.907632E+00 | loss scale: 4096.0 | grad norm: 60405.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 422/ 159576 | consumed samples: 6752 | elapsed time per iteration (ms): 13739.1 | learning rate: 1.873E-06 | global batch size: 16 | lm loss: 8.041030E+00 | loss scale: 4096.0 | grad norm: 49156.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 423/ 159576 | consumed samples: 6768 | elapsed time per iteration (ms): 13364.3 | learning rate: 1.877E-06 | global batch size: 16 | lm loss: 7.965994E+00 | loss scale: 4096.0 | grad norm: 37382.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 424/ 159576 | consumed samples: 6784 | elapsed time per iteration (ms): 13509.2 | learning rate: 1.882E-06 | global batch size: 16 | lm loss: 7.979969E+00 | loss scale: 4096.0 | grad norm: 30214.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 425/ 159576 | consumed samples: 6800 | elapsed time per iteration (ms): 13784.5 | learning rate: 1.886E-06 | global batch size: 16 | lm loss: 7.877289E+00 | loss scale: 4096.0 | grad norm: 31571.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 426/ 159576 | consumed samples: 6816 | elapsed time per iteration (ms): 13491.5 | learning rate: 1.891E-06 | global batch size: 16 | lm loss: 8.049381E+00 | loss scale: 4096.0 | grad norm: 61185.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 427/ 159576 | consumed samples: 6832 | elapsed time per iteration (ms): 13530.6 | learning rate: 1.895E-06 | global batch size: 16 | lm loss: 7.963693E+00 | loss scale: 4096.0 | grad norm: 45639.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 428/ 159576 | consumed samples: 6848 | elapsed time per iteration (ms): 13594.4 | learning rate: 1.899E-06 | global batch size: 16 | lm loss: 7.874112E+00 | loss scale: 4096.0 | grad norm: 34163.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 429/ 159576 | consumed samples: 6864 | elapsed time per iteration (ms): 14157.2 | learning rate: 1.904E-06 | global batch size: 16 | lm loss: 8.141135E+00 | loss scale: 4096.0 | grad norm: 43864.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 430/ 159576 | consumed samples: 6880 | elapsed time per iteration (ms): 13539.3 | learning rate: 1.908E-06 | global batch size: 16 | lm loss: 7.883408E+00 | loss scale: 4096.0 | grad norm: 38957.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 431/ 159576 | consumed samples: 6896 | elapsed time per iteration (ms): 13542.5 | learning rate: 1.913E-06 | global batch size: 16 | lm loss: 7.858832E+00 | loss scale: 4096.0 | grad norm: 26292.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 432/ 159576 | consumed samples: 6912 | elapsed time per iteration (ms): 13843.5 | learning rate: 1.917E-06 | global batch size: 16 | lm loss: 7.901114E+00 | loss scale: 4096.0 | grad norm: 65782.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 433/ 159576 | consumed samples: 6928 | elapsed time per iteration (ms): 13570.9 | learning rate: 1.922E-06 | global batch size: 16 | lm loss: 8.025250E+00 | loss scale: 4096.0 | grad norm: 99671.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 434/ 159576 | consumed samples: 6944 | elapsed time per iteration (ms): 13645.1 | learning rate: 1.926E-06 | global batch size: 16 | lm loss: 7.512252E+00 | loss scale: 4096.0 | grad norm: 55130.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 435/ 159576 | consumed samples: 6960 | elapsed time per iteration (ms): 13607.8 | learning rate: 1.930E-06 | global batch size: 16 | lm loss: 7.858408E+00 | loss scale: 4096.0 | grad norm: 33670.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 436/ 159576 | consumed samples: 6976 | elapsed time per iteration (ms): 13679.8 | learning rate: 1.935E-06 | global batch size: 16 | lm loss: 7.844939E+00 | loss scale: 4096.0 | grad norm: 39814.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 437/ 159576 | consumed samples: 6992 | elapsed time per iteration (ms): 13689.9 | learning rate: 1.939E-06 | global batch size: 16 | lm loss: 8.013271E+00 | loss scale: 4096.0 | grad norm: 62672.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 438/ 159576 | consumed samples: 7008 | elapsed time per iteration (ms): 13781.3 | learning rate: 1.944E-06 | global batch size: 16 | lm loss: 7.903483E+00 | loss scale: 4096.0 | grad norm: 41414.951 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 439/ 159576 | consumed samples: 7024 | elapsed time per iteration (ms): 13527.3 | learning rate: 1.948E-06 | global batch size: 16 | lm loss: 8.131282E+00 | loss scale: 4096.0 | grad norm: 32283.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 440/ 159576 | consumed samples: 7040 | elapsed time per iteration (ms): 13501.3 | learning rate: 1.953E-06 | global batch size: 16 | lm loss: 7.865626E+00 | loss scale: 4096.0 | grad norm: 35041.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 441/ 159576 | consumed samples: 7056 | elapsed time per iteration (ms): 13519.5 | learning rate: 1.957E-06 | global batch size: 16 | lm loss: 7.741554E+00 | loss scale: 4096.0 | grad norm: 36249.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 442/ 159576 | consumed samples: 7072 | elapsed time per iteration (ms): 14043.2 | learning rate: 1.962E-06 | global batch size: 16 | lm loss: 7.954229E+00 | loss scale: 4096.0 | grad norm: 73161.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 443/ 159576 | consumed samples: 7088 | elapsed time per iteration (ms): 13566.1 | learning rate: 1.966E-06 | global batch size: 16 | lm loss: 7.943119E+00 | loss scale: 4096.0 | grad norm: 46167.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 444/ 159576 | consumed samples: 7104 | elapsed time per iteration (ms): 13755.3 | learning rate: 1.970E-06 | global batch size: 16 | lm loss: 7.861948E+00 | loss scale: 4096.0 | grad norm: 37826.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 445/ 159576 | consumed samples: 7120 | elapsed time per iteration (ms): 13434.4 | learning rate: 1.975E-06 | global batch size: 16 | lm loss: 7.838496E+00 | loss scale: 4096.0 | grad norm: 56817.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 446/ 159576 | consumed samples: 7136 | elapsed time per iteration (ms): 13607.2 | learning rate: 1.979E-06 | global batch size: 16 | lm loss: 7.932389E+00 | loss scale: 4096.0 | grad norm: 38213.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 447/ 159576 | consumed samples: 7152 | elapsed time per iteration (ms): 14012.8 | learning rate: 1.984E-06 | global batch size: 16 | lm loss: 7.808257E+00 | loss scale: 4096.0 | grad norm: 37539.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 448/ 159576 | consumed samples: 7168 | elapsed time per iteration (ms): 13428.4 | learning rate: 1.988E-06 | global batch size: 16 | lm loss: 7.818873E+00 | loss scale: 4096.0 | grad norm: 58774.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 449/ 159576 | consumed samples: 7184 | elapsed time per iteration (ms): 13533.7 | learning rate: 1.993E-06 | global batch size: 16 | lm loss: 8.147743E+00 | loss scale: 4096.0 | grad norm: 62996.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 450/ 159576 | consumed samples: 7200 | elapsed time per iteration (ms): 13606.8 | learning rate: 1.997E-06 | global batch size: 16 | lm loss: 8.094215E+00 | loss scale: 4096.0 | grad norm: 28180.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 451/ 159576 | consumed samples: 7216 | elapsed time per iteration (ms): 14132.6 | learning rate: 2.001E-06 | global batch size: 16 | lm loss: 7.781518E+00 | loss scale: 4096.0 | grad norm: 44504.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 452/ 159576 | consumed samples: 7232 | elapsed time per iteration (ms): 13368.4 | learning rate: 2.006E-06 | global batch size: 16 | lm loss: 8.044688E+00 | loss scale: 4096.0 | grad norm: 88794.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 453/ 159576 | consumed samples: 7248 | elapsed time per iteration (ms): 13584.3 | learning rate: 2.010E-06 | global batch size: 16 | lm loss: 7.851390E+00 | loss scale: 4096.0 | grad norm: 63860.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 454/ 159576 | consumed samples: 7264 | elapsed time per iteration (ms): 13723.9 | learning rate: 2.015E-06 | global batch size: 16 | lm loss: 7.919715E+00 | loss scale: 4096.0 | grad norm: 52314.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 455/ 159576 | consumed samples: 7280 | elapsed time per iteration (ms): 13869.1 | learning rate: 2.019E-06 | global batch size: 16 | lm loss: 7.873841E+00 | loss scale: 4096.0 | grad norm: 34440.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 456/ 159576 | consumed samples: 7296 | elapsed time per iteration (ms): 13582.9 | learning rate: 2.024E-06 | global batch size: 16 | lm loss: 8.021425E+00 | loss scale: 4096.0 | grad norm: 38108.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 457/ 159576 | consumed samples: 7312 | elapsed time per iteration (ms): 13563.2 | learning rate: 2.028E-06 | global batch size: 16 | lm loss: 8.019066E+00 | loss scale: 4096.0 | grad norm: 24882.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 458/ 159576 | consumed samples: 7328 | elapsed time per iteration (ms): 13638.8 | learning rate: 2.033E-06 | global batch size: 16 | lm loss: 8.016552E+00 | loss scale: 4096.0 | grad norm: 20634.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 459/ 159576 | consumed samples: 7344 | elapsed time per iteration (ms): 13616.8 | learning rate: 2.037E-06 | global batch size: 16 | lm loss: 7.754219E+00 | loss scale: 4096.0 | grad norm: 43242.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 460/ 159576 | consumed samples: 7360 | elapsed time per iteration (ms): 13985.2 | learning rate: 2.041E-06 | global batch size: 16 | lm loss: 7.788671E+00 | loss scale: 4096.0 | grad norm: 38608.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 461/ 159576 | consumed samples: 7376 | elapsed time per iteration (ms): 13736.9 | learning rate: 2.046E-06 | global batch size: 16 | lm loss: 7.806537E+00 | loss scale: 4096.0 | grad norm: 32594.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 462/ 159576 | consumed samples: 7392 | elapsed time per iteration (ms): 13386.0 | learning rate: 2.050E-06 | global batch size: 16 | lm loss: 7.940393E+00 | loss scale: 4096.0 | grad norm: 27037.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 463/ 159576 | consumed samples: 7408 | elapsed time per iteration (ms): 13564.9 | learning rate: 2.055E-06 | global batch size: 16 | lm loss: 7.988055E+00 | loss scale: 4096.0 | grad norm: 27394.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 464/ 159576 | consumed samples: 7424 | elapsed time per iteration (ms): 14013.6 | learning rate: 2.059E-06 | global batch size: 16 | lm loss: 8.004810E+00 | loss scale: 4096.0 | grad norm: 43759.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 465/ 159576 | consumed samples: 7440 | elapsed time per iteration (ms): 13546.2 | learning rate: 2.064E-06 | global batch size: 16 | lm loss: 7.704327E+00 | loss scale: 4096.0 | grad norm: 30191.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 466/ 159576 | consumed samples: 7456 | elapsed time per iteration (ms): 13671.9 | learning rate: 2.068E-06 | global batch size: 16 | lm loss: 7.774131E+00 | loss scale: 4096.0 | grad norm: 26963.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 467/ 159576 | consumed samples: 7472 | elapsed time per iteration (ms): 13643.6 | learning rate: 2.072E-06 | global batch size: 16 | lm loss: 7.856277E+00 | loss scale: 4096.0 | grad norm: 19255.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 468/ 159576 | consumed samples: 7488 | elapsed time per iteration (ms): 13848.0 | learning rate: 2.077E-06 | global batch size: 16 | lm loss: 7.999278E+00 | loss scale: 4096.0 | grad norm: 61835.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 469/ 159576 | consumed samples: 7504 | elapsed time per iteration (ms): 13946.4 | learning rate: 2.081E-06 | global batch size: 16 | lm loss: 7.747583E+00 | loss scale: 4096.0 | grad norm: 42910.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 470/ 159576 | consumed samples: 7520 | elapsed time per iteration (ms): 13471.2 | learning rate: 2.086E-06 | global batch size: 16 | lm loss: 7.847405E+00 | loss scale: 4096.0 | grad norm: 29043.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 471/ 159576 | consumed samples: 7536 | elapsed time per iteration (ms): 13595.6 | learning rate: 2.090E-06 | global batch size: 16 | lm loss: 7.886540E+00 | loss scale: 4096.0 | grad norm: 22573.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 472/ 159576 | consumed samples: 7552 | elapsed time per iteration (ms): 13582.6 | learning rate: 2.095E-06 | global batch size: 16 | lm loss: 7.949501E+00 | loss scale: 4096.0 | grad norm: 81307.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 473/ 159576 | consumed samples: 7568 | elapsed time per iteration (ms): 13977.1 | learning rate: 2.099E-06 | global batch size: 16 | lm loss: 7.798001E+00 | loss scale: 4096.0 | grad norm: 27221.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 474/ 159576 | consumed samples: 7584 | elapsed time per iteration (ms): 13666.7 | learning rate: 2.104E-06 | global batch size: 16 | lm loss: 7.990824E+00 | loss scale: 4096.0 | grad norm: 50253.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 474 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-24 04:00:46,754] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step474/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 474 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17639.87 [exiting program after 110.0032222946485 minutes] datetime: 2021-09-24 04:00:58 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible transformer ............ [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ninja .................. [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES] ...... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system fused_adam ............. [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- DeepSpeed C++/CUDA extension op report async_io ............... [NO] ....... [NO] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system sparse_attn ............ [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] [YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] async_io ...............utils [NO].................. .......[YES] [NO]...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] .......transformer_inference [OKAY].. stochastic_transformer . [NO] ....... [OKAY] [NO] ....... --------------------------------------------------[OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... transformer_inference[NO] .. [NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... quantizer .............. [OKAY][NO] ....... [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ninja .................. [OKAY] JIT compiled ops requires ninja -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] DeepSpeed general environment info: transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 utils ..................utils [YES].................. [YES]...... ......[OKAY] [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed general environment info: JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO]quantizer ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.-------------------------------------------------- -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils ..................utils [YES] ........................ [YES][OKAY] ...... [OKAY]quantizer .............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]ninja .................. [OKAY] -------------------------------------------------- op name ................ installed sparse_attn.. ............compatible [NO]-------------------------------------------------- ....... [OKAY] transformer ............cpu_adam [NO]............... .......[YES] [OKAY]...... [OKAY] stochastic_transformer . [NO] ....... fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... 1.8.1torch cuda version ............... torch cuda version11.1 ............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info0.4.2+bc17042, bc17042, big-science ................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]-------------------------------------------------- DeepSpeed C++/CUDA extension op report transformer-------------------------------------------------- ............NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. [NO]-------------------------------------------------- JIT compiled ops requires ninja ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found ninja .................. [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [NO][OKAY] ....... [OKAY] quantizer .............. [NO]utils ......................... [OKAY][YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ninja .................. [OKAY] JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer async_io.............. ...............[NO] [NO]....... .......[OKAY] [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] op name ................ installed .. compatible -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science cpu_adam ............... [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science async_io ............... [NO] ....... [NO]transformer_inference deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] /bin/sh: line 0: type: git: not found quantizer ..............utils [NO].................. .......[YES] [OKAY]...... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system **** Git info for Megatron: git_hash=unknown git_branch=unknown **** meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version DeepSpeed general environment info:..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path deepspeed info............... ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ......['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system utils .................. [YES] ...... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] ninja .................. [OKAY] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]ninja .................. fused_lamb[OKAY] .............-------------------------------------------------- [NO] op name....... ................[OKAY] installed .. compatible -------------------------------------------------- sparse_attn ............cpu_adam [NO]............... .......[YES] [OKAY]...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] stochastic_transformer....... [OKAY]. [NO] ....... fused_lamb[OKAY] ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- async_io ............... [NO] ....... [NO] JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. [YES] ......quantizer [OKAY].............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path DeepSpeed general environment info:........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... torch install path0.4.2+bc17042, bc17042, big-science ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc versionDeepSpeed general environment info: ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path deepspeed info............... ................... 0.4.2+bc17042, bc17042, big-science torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w. ......['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. --------------------------------------------------[OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report op name-------------------------------------------------- ................ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.installed ..-------------------------------------------------- compatibleJIT compiled ops requires ninja -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch install path.................... ...............1.8.1 torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version .....................torch version 11.2.................... deepspeed install path1.8.1 ........... torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']............... 11.1deepspeed info ...................nvcc version 0.4.2+bc17042, bc17042, big-science..................... 11.2deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path torch cuda version............... ............... 11.1 nvcc version .....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.2 deepspeed install path torch version........... .................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']1.8.1 deepspeed info ...................torch cuda version 0.4.2+bc17042, bc17042, big-science............... deepspeed wheel compiled w.11.1 ......nvcc version torch 1.8, cuda 11.1..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science async_io ............... [NO] ....... [NO] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.1 torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed general environment info: op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 fused_adam ............. [NO] ....... [OKAY] torch cuda version ............... 11.1 fused_lamb ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 sparse_attn ............ [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer ............ [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] quantizer .............. utils[NO] ......................... [OKAY][YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- DeepSpeed general environment info: op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 fused_adam ............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_lamb ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] .......-------------------------------------------------- [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 sparse_attn ............ [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version .................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] ....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] -------------------------------------------------- utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science quantizer .............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ...............DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ...............torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version ............... 11.1torch version nvcc version.................... .....................1.8.1 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] nvcc version deepspeed info..................... ...................11.2 0.4.2+bc17042, bc17042, big-science deepspeed install path deepspeed wheel compiled w............ ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed general environment info: -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`....... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] ninja .................. [OKAY] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- torch cuda version ............... 11.1 op name ................ installed .. compatible -------------------------------------------------- nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version .....................DeepSpeed general environment info: 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install pathdeepspeed info .................................. 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [OKAY][NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch version .................... 1.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch cuda version ............... 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference .. quantizer .............. [NO] ....... [OKAY] [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] async_io...... [OKAY]............... [NO] .......quantizer [NO] .............. [NO] ....... [OKAY] --------------------------------------------------transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible /bin/sh: line 0: type: git: not found -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] DeepSpeed general environment info: stochastic_transformer . [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... DeepSpeed general environment info:11.1 nvcc version ..................... 11.2 deepspeed install path ...........torch install path ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ........... deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version .................... ...............1.8.1 torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version .....................torch version 11.2.................... deepspeed install path1.8.1 ........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ............... deepspeed info11.1 ................... nvcc version0.4.2+bc17042, bc17042, big-science ..................... deepspeed wheel compiled w.11.2 ...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version torch version............... ....................11.1 1.8.1nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... 11.2deepspeed info ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] /bin/sh: line 0: type: git: not found [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] utils...... [OKAY].................. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] [YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. torch version .................... 1.8.1 [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 ninja .................. [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op name ................ installed .. compatible deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ninja .................. [OKAY] torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 op name ................ installed .. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] cpu_adam ............... [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_adam ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- sparse_attn NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op............. --------------------------------------------------[NO] JIT compiled ops requires ninja....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninjaop name .................................. installed ..[OKAY] compatible ---------------------------------------------------------------------------------------------------- op name ................ installed .. cpu_adamcompatible ............... --------------------------------------------------[YES] ...... [OKAY] cpu_adam ............... [YES]fused_adam ................... [NO][OKAY] ....... [OKAY] fused_lamb ............. [NO] ....... fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_lamb sparse_attn............. ............[NO] [NO] .............. [OKAY][OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............stochastic_transformer [NO] ........ [NO] [OKAY]....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install pathdeepspeed info .................................. 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 sparse_attn ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version .................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.DeepSpeed general environment info: ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja-------------------------------------------------- .................. [OKAY] -------------------------------------------------- cpu_adamop name ............... ................[YES] installed...... ..[OKAY] compatible -------------------------------------------------- fused_adam cpu_adam............. [NO]............... ....... [YES][OKAY] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attnfused_lamb ............ .............[NO] [NO]....... .......[OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attnstochastic_transformer ............. [NO][NO] .............. [OKAY][OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ...............DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path ...............torch version .................... 1.8.1 torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... 11.1torch version nvcc version.................... .....................1.8.1 11.2 deepspeed install pathtorch cuda version ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info............... ................... 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ninja installed .................... [OKAY]compatible ---------------------------------------------------------------------------------------------------- op name ................ installed .. compatible cpu_adam-------------------------------------------------- ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] transformer ............ stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ...... [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] --------------------------------------------------ninja op name .................................. [OKAY]installed .. --------------------------------------------------compatible --------------------------------------------------op name ................ installed .. compatible cpu_adam-------------------------------------------------- ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lambfused_adam ............. .............[NO] [NO]....... [OKAY]....... [OKAY] fused_lamb ............. [NO] ....... sparse_attn[OKAY] ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]sparse_attn ............ [NO] stochastic_transformer....... [OKAY]. [NO] .......transformer [OKAY]............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info:deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................torch install path 0.4.2+bc17042, bc17042, big-science............... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch install path ................................... 1.8.1 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 nvcc version .....................torch version 11.2.................... deepspeed install path1.8.1 ........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ............... deepspeed info11.1 ................... nvcc version0.4.2+bc17042, bc17042, big-science ..................... deepspeed wheel compiled w.11.2 ......deepspeed install path torch 1.8, cuda 11.1........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................. ..................[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------op name ................ op nameinstalled .................. installedcompatible .. --------------------------------------------------compatible -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ...... [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_adamop name ............................. [NO]installed ......... [OKAY]compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] .......fused_adam [OKAY] ............. [NO] transformer....... ............[OKAY] [NO] ....... [OKAY]fused_lamb ............. [NO] .......stochastic_transformer [OKAY] . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalled installed ...... .. compatible compatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............cpu_adam[YES] cpu_adam [YES] ...... [OKAY]............... ..................... [YES][OKAY][YES] ............ fused_adam[OKAY][OKAY] ............. [NO] ....... fused_adam[OKAY] ............. [NO]fused_lambfused_adam fused_adam .................... ............. .............[OKAY] [NO] [NO] [NO] .......fused_lamb ....... .................... [OKAY][NO] [OKAY] [OKAY]....... [OKAY] fused_lamb fused_lamb............. ............. [NO][NO] .......sparse_attn....... [OKAY] ............ [OKAY]sparse_attn [NO] ............ .......[NO] [OKAY]....... [OKAY] transformer ............sparse_attntransformer sparse_attn [NO]........................ ...................[NO][NO] [NO].......[OKAY]....... .......[OKAY][OKAY] stochastic_transformer[OKAY] transformer stochastic_transformer.............transformer [NO][NO]. ............ ....... [NO]....... [NO] [OKAY] ....... [OKAY].......[OKAY] [OKAY] stochastic_transformer stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. utils[NO] ......................... [YES][OKAY] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch install path.................... 1.8.1............... torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ..................... torch version11.2 ....................deepspeed install path 1.8.1........... async_io ............... [NO] ....... [NO] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version ...............deepspeed info 11.1................... 0.4.2+bc17042, bc17042, big-sciencenvcc version deepspeed wheel compiled w...................... ......11.2 torch 1.8, cuda 11.1deepspeed install path transformer_inference .. [NO] ....... [OKAY] ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op name ................op name................................ ................installedinstalled installed installed .... ....compatible compatible compatible-------------------------------------------------- compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... cpu_adam............... [YES]cpu_adam ............... [YES]..................... [OKAY][YES] ............ [YES] [OKAY] [OKAY] ...... [OKAY] fused_adam ............. fused_adam[NO]fused_adam ....... fused_adam............. ............. [OKAY] [NO] .............[NO] ....... fused_lamb[NO] [OKAY] ............. ....... ....... [NO] [OKAY] [OKAY] fused_lamb....... .............[OKAY] [NO]fused_lambfused_lamb ................................. [OKAY][NO][NO] .............. sparse_attn[OKAY][OKAY] ............ [NO] ....... sparse_attn[OKAY] ............ [NO]transformer ................... [OKAY]sparse_attn[NO]sparse_attn ...............................transformer [OKAY][NO]............[NO] ....... [NO] .......[OKAY]stochastic_transformer [OKAY]....... .transformer[OKAY] transformer[NO] ............................... stochastic_transformer[NO][OKAY] [NO] ............... [NO] [OKAY] [OKAY] ....... [OKAY]stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1162747.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 110 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference ..[OKAY] [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY]quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op nameop name ................ ................ ................ installed ................installedinstalled ..installed .. .. ..compatible compatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam ............... ............... .............................. [YES][YES][YES] ......[YES] ............ [OKAY] ......[OKAY] [OKAY] [OKAY] fused_adam .............fused_adam [NO]fused_adam............. ....................fused_adam [NO] [OKAY] [NO] ....... .......[OKAY].............fused_lamb [OKAY][NO]............. fused_lamb [NO]............. fused_lamb ....... ....... [NO]............. [OKAY] [OKAY] .......[NO] [OKAY]....... [OKAY] sparse_attn fused_lamb............ sparse_attn.............[NO] sparse_attn[NO]................... ............[OKAY][NO] [NO]..............transformer .......[OKAY]............ [OKAY][NO] .......[OKAY] transformer[OKAY]transformer ........................ [NO][NO] stochastic_transformer ....... ....... [OKAY][OKAY]. [NO] .......stochastic_transformerstochastic_transformer [OKAY] .sparse_attn . [NO] ............ [NO] ....... .......[OKAY] [NO][OKAY] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... torch install path1.8.1 ............... torch cuda version ............... 11.1 nvcc version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ..................... 11.2torch version deepspeed install path.................... ...........1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version deepspeed info............... ...................11.1 nvcc version0.4.2+bc17042, bc17042, big-science .....................deepspeed wheel compiled w. 11.2...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io............... [NO]............... .......[NO] [NO]....... [NO] async_iotransformer_inference ................. transformer_inference[NO][NO] ................ [NO][NO][OKAY] ....... [OKAY] utils .................. utils[YES] transformer_inference .................. ...... ..[YES][OKAY] [NO]...... .......[OKAY] quantizer[OKAY] .............. quantizer[NO] .....................utils [NO][OKAY].................. .......[YES] [OKAY] --------------------------------------------------...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting codecarbon ... > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-24 04:01:23,432] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.299 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 17.207 seconds time to initialize megatron (seconds): 4.980 [after megatron is initialized] datetime: 2021-09-24 04:01:40 building GPT model ... [2021-09-24 04:01:41,035] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-24 04:01:41,037] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-24 04:01:41,037] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.36 GB, percent = 20.0% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-24 04:01:42,442] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 [2021-09-24 04:01:43,676] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-24 04:01:43,677] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-24 04:01:43,677] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.55 GB, percent = 20.1% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-24 04:01:43,733] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-24 04:01:43,813] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-24 04:01:43,813] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-24 04:01:43,813] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-24 04:01:43,813] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-24 04:01:43,813] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-24 04:01:43,813] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-24 04:01:43,814] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-24 04:01:43,814] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-24 04:01:43,814] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-24 04:01:43,814] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-24 04:01:48,526] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-24 04:01:48,527] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-24 04:01:48,527] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-24 04:01:48,527] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-24 04:01:48,527] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-24 04:01:48,527] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] amp_params ................... False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] dump_state ................... False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] pld_params ................... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-24 04:01:48,529] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-24 04:01:48,529] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 160 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 71 successfully loaded 8 ZeRO state_dicts for rank 147 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 213 successfully loaded 8 ZeRO state_dicts for rank 162 successfully loaded 8 ZeRO state_dicts for rank 97 successfully loaded 8 ZeRO state_dicts for rank 51 successfully loaded 8 ZeRO state_dicts for rank 133 loading 8 zero partition checkpoints for rank 124 successfully loaded 8 ZeRO state_dicts for rank 114 successfully loaded 8 ZeRO state_dicts for rank 33 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 181 successfully loaded 8 ZeRO state_dicts for rank 41 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 241 successfully loaded 8 ZeRO state_dicts for rank 134 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 24 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 104 successfully loaded 8 ZeRO state_dicts for rank 142 successfully loaded 8 ZeRO state_dicts for rank 154 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 75 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 04:02:16 CEST)" was missed by 0:00:03.600668 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 98 successfully loaded 8 ZeRO state_dicts for rank 210 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 28 successfully loaded 8 ZeRO state_dicts for rank 110 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 36 successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 26 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 208 successfully loaded 8 ZeRO state_dicts for rank 190 successfully loaded 8 ZeRO state_dicts for rank 92 loading 8 zero partition checkpoints for rank 115 successfully loaded 8 ZeRO state_dicts for rank 34 successfully loaded 8 ZeRO state_dicts for rank 171 successfully loaded 8 ZeRO state_dicts for rank 152 successfully loaded 8 ZeRO state_dicts for rank 73 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 88 successfully loaded 8 ZeRO state_dicts for rank 109 successfully loaded 8 ZeRO state_dicts for rank 56 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 42 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 196 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 74 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 78 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 153 successfully loaded 8 ZeRO state_dicts for rank 77 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 125 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 151 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 107 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 130 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 164 successfully loaded 8 ZeRO state_dicts for rank 205 successfully loaded 8 ZeRO state_dicts for rank 94 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 225 successfully loaded 8 ZeRO state_dicts for rank 25 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 15 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 46 successfully loaded 8 ZeRO state_dicts for rank 170 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 58 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 13 loading 8 zero partition checkpoints for rank 127 successfully loaded 8 ZeRO state_dicts for rank 183 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 55 successfully loaded 8 ZeRO state_dicts for rank 66 successfully loaded 8 ZeRO state_dicts for rank 14 successfully loaded 8 ZeRO state_dicts for rank 240 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 65 successfully loaded 8 ZeRO state_dicts for rank 146 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 200 successfully loaded 8 ZeRO state_dicts for rank 138 successfully loaded 8 ZeRO state_dicts for rank 211 successfully loaded 8 ZeRO state_dicts for rank 45 successfully loaded 8 ZeRO state_dicts for rank 38 successfully loaded 8 ZeRO state_dicts for rank 229 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 31 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 177 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 117 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 04:02:20 CEST)" was missed by 0:00:03.124446 successfully loaded 8 ZeRO state_dicts for rank 23 successfully loaded 8 ZeRO state_dicts for rank 188 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 4 successfully loaded 8 ZeRO state_dicts for rank 167 successfully loaded 8 ZeRO state_dicts for rank 236 loading 8 zero partition checkpoints for rank 61 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 176 successfully loaded 8 ZeRO state_dicts for rank 174 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 169 loading 8 zero partition checkpoints for rank 48 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 106 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 53 successfully loaded 8 ZeRO state_dicts for rank 235 successfully loaded 8 ZeRO state_dicts for rank 191 loading 8 zero partition checkpoints for rank 60 successfully loaded 8 ZeRO state_dicts for rank 227 successfully loaded 8 ZeRO state_dicts for rank 120 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 250 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 6 successfully loaded 8 ZeRO state_dicts for rank 237 successfully loaded 8 ZeRO state_dicts for rank 118 successfully loaded 8 ZeRO state_dicts for rank 119 loading 8 zero partition checkpoints for rank 68 successfully loaded 8 ZeRO state_dicts for rank 22 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 121 successfully loaded 8 ZeRO state_dicts for rank 218 successfully loaded 8 ZeRO state_dicts for rank 221 loading 8 zero partition checkpoints for rank 113 successfully loaded 8 ZeRO state_dicts for rank 9 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 251 loading 8 zero partition checkpoints for rank 72 successfully loaded 8 ZeRO state_dicts for rank 179 successfully loaded 8 ZeRO state_dicts for rank 247 successfully loaded 8 ZeRO state_dicts for rank 12 successfully loaded 8 ZeRO state_dicts for rank 29 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 231 successfully loaded 8 ZeRO state_dicts for rank 239 successfully loaded 8 ZeRO state_dicts for rank 245 loading 8 zero partition checkpoints for rank 32 successfully loaded 8 ZeRO state_dicts for rank 255 successfully loaded 8 ZeRO state_dicts for rank 232 successfully loaded 8 ZeRO state_dicts for rank 238 successfully loaded 8 ZeRO state_dicts for rank 7 successfully loaded 8 ZeRO state_dicts for rank 228 successfully loaded 8 ZeRO state_dicts for rank 67 successfully loaded 8 ZeRO state_dicts for rank 252 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 230 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 194 loading 8 zero partition checkpoints for rank 112 loading 8 zero partition checkpoints for rank 135 successfully loaded 8 ZeRO state_dicts for rank 5 successfully loaded 8 ZeRO state_dicts for rank 103 loading 8 zero partition checkpoints for rank 111 successfully loaded 8 ZeRO state_dicts for rank 21 loading 8 zero partition checkpoints for rank 63 successfully loaded 8 ZeRO state_dicts for rank 165 successfully loaded 8 ZeRO state_dicts for rank 54 successfully loaded 8 ZeRO state_dicts for rank 102 successfully loaded 8 ZeRO state_dicts for rank 233 successfully loaded 8 ZeRO state_dicts for rank 85 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 11 successfully loaded 8 ZeRO state_dicts for rank 226 successfully loaded 8 ZeRO state_dicts for rank 101 loading 8 zero partition checkpoints for rank 160 loading 8 zero partition checkpoints for rank 143 loading 8 zero partition checkpoints for rank 155 successfully loaded 8 ZeRO state_dicts for rank 199 successfully loaded 8 ZeRO state_dicts for rank 1 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 20 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 76 successfully loaded 8 ZeRO state_dicts for rank 246 successfully loaded 8 ZeRO state_dicts for rank 242 successfully loaded 8 ZeRO state_dicts for rank 254 successfully loaded 8 ZeRO state_dicts for rank 0 successfully loaded 8 ZeRO state_dicts for rank 253 successfully loaded 8 ZeRO state_dicts for rank 2 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 201 loading 8 zero partition checkpoints for rank 33 successfully loaded 8 ZeRO state_dicts for rank 224 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 212 successfully loaded 8 ZeRO state_dicts for rank 122 loading 8 zero partition checkpoints for rank 214 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 114 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 154 successfully loaded 8 ZeRO state_dicts for rank 10 loading 8 zero partition checkpoints for rank 132 successfully loaded 8 ZeRO state_dicts for rank 249 loading 8 zero partition checkpoints for rank 147 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 57 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 156 successfully loaded 8 ZeRO state_dicts for rank 3 loading 8 zero partition checkpoints for rank 75 loading 8 zero partition checkpoints for rank 148 loading 8 zero partition checkpoints for rank 104 loading 8 zero partition checkpoints for rank 142 successfully loaded 8 ZeRO state_dicts for rank 234 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 52 loading 8 zero partition checkpoints for rank 134 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 139 successfully loaded 8 ZeRO state_dicts for rank 30 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 168 loading 8 zero partition checkpoints for rank 158 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 97 loading 8 zero partition checkpoints for rank 73 loading 8 zero partition checkpoints for rank 152 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 108 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 88 loading 8 zero partition checkpoints for rank 109 loading 8 zero partition checkpoints for rank 157 loading 8 zero partition checkpoints for rank 40 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 215 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 80 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 192 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 150 loading 8 zero partition checkpoints for rank 153 loading 8 zero partition checkpoints for rank 171 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 140 loading 8 zero partition checkpoints for rank 159 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 141 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 128 loading 8 zero partition checkpoints for rank 206 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 144 loading 8 zero partition checkpoints for rank 62 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 170 loading 8 zero partition checkpoints for rank 180 loading 8 zero partition checkpoints for rank 130 loading 8 zero partition checkpoints for rank 216 loading 8 zero partition checkpoints for rank 100 loading 8 zero partition checkpoints for rank 183 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 205 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 138 loading 8 zero partition checkpoints for rank 184 loading 8 zero partition checkpoints for rank 64 loading 8 zero partition checkpoints for rank 145 loading 8 zero partition checkpoints for rank 211 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 81 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 96 loading 8 zero partition checkpoints for rank 137 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 69 loading 8 zero partition checkpoints for rank 167 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 219 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 136 loading 8 zero partition checkpoints for rank 209 loading 8 zero partition checkpoints for rank 65 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 166 loading 8 zero partition checkpoints for rank 106 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 196 loading 8 zero partition checkpoints for rank 178 loading 8 zero partition checkpoints for rank 107 loading 8 zero partition checkpoints for rank 200 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 110 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 240 loading 8 zero partition checkpoints for rank 83 loading 8 zero partition checkpoints for rank 56 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 176 loading 8 zero partition checkpoints for rank 105 loading 8 zero partition checkpoints for rank 177 loading 8 zero partition checkpoints for rank 221 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 218 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 169 loading 8 zero partition checkpoints for rank 194 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 250 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 199 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 197 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 58 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 229 loading 8 zero partition checkpoints for rank 99 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 232 loading 8 zero partition checkpoints for rank 193 loading 8 zero partition checkpoints for rank 239 loading 8 zero partition checkpoints for rank 23 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 251 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 252 loading 8 zero partition checkpoints for rank 238 loading 8 zero partition checkpoints for rank 7 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 6 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 246 loading 8 zero partition checkpoints for rank 243 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 50 loading 8 zero partition checkpoints for rank 220 loading 8 zero partition checkpoints for rank 195 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 165 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 207 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 204 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 5 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 116 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 101 loading 8 zero partition checkpoints for rank 67 loading 8 zero partition checkpoints for rank 93 loading 8 zero partition checkpoints for rank 242 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 87 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 0 loading 8 zero partition checkpoints for rank 244 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 223 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 57 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 15 loading 8 zero partition checkpoints for rank 248 loading 8 zero partition checkpoints for rank 120 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 235 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 255 loading 8 zero partition checkpoints for rank 172 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 227 loading 8 zero partition checkpoints for rank 249 loading 8 zero partition checkpoints for rank 30 loading 8 zero partition checkpoints for rank 174 loading 8 zero partition checkpoints for rank 226 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 175 loading 8 zero partition checkpoints for rank 173 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 123 loading 8 zero partition checkpoints for rank 8 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 11 loading 8 zero partition checkpoints for rank 10 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 16 successfully loaded 8 ZeRO state_dicts for rank 17 loading 8 zero partition checkpoints for rank 18 successfully loaded 8 ZeRO state_dicts for rank 19 loading 8 zero partition checkpoints for rank 16 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 19 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 474 time (ms) | load-checkpoint: 86577.34 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-24 04:03:15 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.164226 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.365 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.203 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.072 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-24 04:03:22 done with setup ... training ... time (ms) | model-and-optimizer-setup: 94922.27 | train/valid/test-data-iterators-setup: 5644.20 [before the start of training step] datetime: 2021-09-24 04:03:22 [2021-09-24 04:03:22,280] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-24 04:03:22,280] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-24 04:03:22,281] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-24 04:03:22,281] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-24 04:03:22,281] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [2021-09-24 04:03:47] PULSE: tr8-104B is waiting to be scheduled (1159457_[1-10%1] on 'gpu_p13' partition) [2021-09-24 04:03:47] PULSE: tr8-104B is scheduled to start in 18:10:24 (at 2021-09-24T22:14:12) (1161605 on 'gpu_p13' partition) [2021-09-24 04:03:47] PULSE: tr8-104B is running for 2:42 since 2021-09-24T04:01:05 (1162747 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) [Rank 33] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 65] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0 [Rank 1] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 225] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22108.0 | max reserved: 22108.0 [Rank 97] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 129] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 193] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18778.0 | max reserved: 18778.0 [Rank 161] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 2] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 22878.0 | max reserved: 22878.0 [Rank 226] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 20752.0 | max reserved: 20752.0 [Rank 34] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 66] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 98] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0 [Rank 130] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 194] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 162] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 0] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 23514.0 | max reserved: 23514.0 [Rank 224] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22108.0 | max reserved: 22108.0 [Rank 32] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 64] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 96] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 192] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18884.0 | max reserved: 18884.0 [Rank 128] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18884.0 | max reserved: 18884.0 [Rank 160] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 3] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 22890.0 | max reserved: 22890.0 [Rank 35] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 227] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 20752.0 | max reserved: 20752.0 [Rank 67] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 99] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 131] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0 [Rank 195] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 163] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 iteration 475/ 159576 | consumed samples: 7600 | elapsed time per iteration (ms): 29962.7 | learning rate: 2.108E-06 | global batch size: 16 | lm loss: 7.833103E+00 | loss scale: 4096.0 | grad norm: 47969.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 476/ 159576 | consumed samples: 7616 | elapsed time per iteration (ms): 13562.3 | learning rate: 2.112E-06 | global batch size: 16 | lm loss: 7.715385E+00 | loss scale: 4096.0 | grad norm: 28643.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 477/ 159576 | consumed samples: 7632 | elapsed time per iteration (ms): 14532.6 | learning rate: 2.117E-06 | global batch size: 16 | lm loss: 7.912835E+00 | loss scale: 4096.0 | grad norm: 18978.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 478/ 159576 | consumed samples: 7648 | elapsed time per iteration (ms): 13659.0 | learning rate: 2.121E-06 | global batch size: 16 | lm loss: 7.845491E+00 | loss scale: 4096.0 | grad norm: 29417.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 479/ 159576 | consumed samples: 7664 | elapsed time per iteration (ms): 13928.5 | learning rate: 2.126E-06 | global batch size: 16 | lm loss: 7.818515E+00 | loss scale: 4096.0 | grad norm: 24185.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 480/ 159576 | consumed samples: 7680 | elapsed time per iteration (ms): 13863.2 | learning rate: 2.130E-06 | global batch size: 16 | lm loss: 7.759526E+00 | loss scale: 4096.0 | grad norm: 18058.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 481/ 159576 | consumed samples: 7696 | elapsed time per iteration (ms): 13613.0 | learning rate: 2.135E-06 | global batch size: 16 | lm loss: 7.666837E+00 | loss scale: 4096.0 | grad norm: 21581.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 482/ 159576 | consumed samples: 7712 | elapsed time per iteration (ms): 13350.8 | learning rate: 2.139E-06 | global batch size: 16 | lm loss: 7.929407E+00 | loss scale: 4096.0 | grad norm: 22311.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 483/ 159576 | consumed samples: 7728 | elapsed time per iteration (ms): 13819.2 | learning rate: 2.143E-06 | global batch size: 16 | lm loss: 7.786575E+00 | loss scale: 4096.0 | grad norm: 23821.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 484/ 159576 | consumed samples: 7744 | elapsed time per iteration (ms): 13697.3 | learning rate: 2.148E-06 | global batch size: 16 | lm loss: 7.834505E+00 | loss scale: 4096.0 | grad norm: 18706.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 485/ 159576 | consumed samples: 7760 | elapsed time per iteration (ms): 13285.4 | learning rate: 2.152E-06 | global batch size: 16 | lm loss: 7.796403E+00 | loss scale: 4096.0 | grad norm: 23055.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 486/ 159576 | consumed samples: 7776 | elapsed time per iteration (ms): 13893.0 | learning rate: 2.157E-06 | global batch size: 16 | lm loss: 7.853868E+00 | loss scale: 4096.0 | grad norm: 16300.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 487/ 159576 | consumed samples: 7792 | elapsed time per iteration (ms): 14059.7 | learning rate: 2.161E-06 | global batch size: 16 | lm loss: 7.943846E+00 | loss scale: 4096.0 | grad norm: 18420.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 488/ 159576 | consumed samples: 7808 | elapsed time per iteration (ms): 13994.0 | learning rate: 2.166E-06 | global batch size: 16 | lm loss: 7.850654E+00 | loss scale: 4096.0 | grad norm: 17235.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 489/ 159576 | consumed samples: 7824 | elapsed time per iteration (ms): 13596.2 | learning rate: 2.170E-06 | global batch size: 16 | lm loss: 7.825228E+00 | loss scale: 4096.0 | grad norm: 16217.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 490/ 159576 | consumed samples: 7840 | elapsed time per iteration (ms): 14562.4 | learning rate: 2.175E-06 | global batch size: 16 | lm loss: 7.944909E+00 | loss scale: 4096.0 | grad norm: 20367.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 491/ 159576 | consumed samples: 7856 | elapsed time per iteration (ms): 13373.8 | learning rate: 2.179E-06 | global batch size: 16 | lm loss: 7.772738E+00 | loss scale: 4096.0 | grad norm: 14868.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 492/ 159576 | consumed samples: 7872 | elapsed time per iteration (ms): 13407.0 | learning rate: 2.183E-06 | global batch size: 16 | lm loss: 7.807293E+00 | loss scale: 4096.0 | grad norm: 12933.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 493/ 159576 | consumed samples: 7888 | elapsed time per iteration (ms): 13535.9 | learning rate: 2.188E-06 | global batch size: 16 | lm loss: 7.796512E+00 | loss scale: 4096.0 | grad norm: 14067.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 494/ 159576 | consumed samples: 7904 | elapsed time per iteration (ms): 13629.5 | learning rate: 2.192E-06 | global batch size: 16 | lm loss: 7.792056E+00 | loss scale: 4096.0 | grad norm: 14953.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 495/ 159576 | consumed samples: 7920 | elapsed time per iteration (ms): 14163.4 | learning rate: 2.197E-06 | global batch size: 16 | lm loss: 7.703032E+00 | loss scale: 4096.0 | grad norm: 14533.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 496/ 159576 | consumed samples: 7936 | elapsed time per iteration (ms): 13588.6 | learning rate: 2.201E-06 | global batch size: 16 | lm loss: 7.740438E+00 | loss scale: 4096.0 | grad norm: 13505.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 497/ 159576 | consumed samples: 7952 | elapsed time per iteration (ms): 13861.0 | learning rate: 2.206E-06 | global batch size: 16 | lm loss: 7.741710E+00 | loss scale: 4096.0 | grad norm: 15979.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 498/ 159576 | consumed samples: 7968 | elapsed time per iteration (ms): 13984.2 | learning rate: 2.210E-06 | global batch size: 16 | lm loss: 7.999316E+00 | loss scale: 4096.0 | grad norm: 17409.113 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 499/ 159576 | consumed samples: 7984 | elapsed time per iteration (ms): 13944.3 | learning rate: 2.214E-06 | global batch size: 16 | lm loss: 7.852047E+00 | loss scale: 4096.0 | grad norm: 17274.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 500/ 159576 | consumed samples: 8000 | elapsed time per iteration (ms): 13842.0 | learning rate: 2.219E-06 | global batch size: 16 | lm loss: 7.828729E+00 | loss scale: 8192.0 | grad norm: 13323.901 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 501/ 159576 | consumed samples: 8016 | elapsed time per iteration (ms): 13887.5 | learning rate: 2.223E-06 | global batch size: 16 | lm loss: 7.889397E+00 | loss scale: 8192.0 | grad norm: 36733.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 502/ 159576 | consumed samples: 8032 | elapsed time per iteration (ms): 14250.0 | learning rate: 2.228E-06 | global batch size: 16 | lm loss: 7.699535E+00 | loss scale: 8192.0 | grad norm: 25128.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 503/ 159576 | consumed samples: 8048 | elapsed time per iteration (ms): 14013.2 | learning rate: 2.232E-06 | global batch size: 16 | lm loss: 7.717435E+00 | loss scale: 8192.0 | grad norm: 27928.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 504/ 159576 | consumed samples: 8064 | elapsed time per iteration (ms): 13885.3 | learning rate: 2.237E-06 | global batch size: 16 | lm loss: 7.793045E+00 | loss scale: 8192.0 | grad norm: 25342.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 505/ 159576 | consumed samples: 8080 | elapsed time per iteration (ms): 14216.7 | learning rate: 2.241E-06 | global batch size: 16 | lm loss: 7.810180E+00 | loss scale: 8192.0 | grad norm: 32722.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 506/ 159576 | consumed samples: 8096 | elapsed time per iteration (ms): 13476.3 | learning rate: 2.246E-06 | global batch size: 16 | lm loss: 7.789536E+00 | loss scale: 8192.0 | grad norm: 28438.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 507/ 159576 | consumed samples: 8112 | elapsed time per iteration (ms): 13866.3 | learning rate: 2.250E-06 | global batch size: 16 | lm loss: 7.752525E+00 | loss scale: 8192.0 | grad norm: 38662.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 508/ 159576 | consumed samples: 8128 | elapsed time per iteration (ms): 14262.5 | learning rate: 2.254E-06 | global batch size: 16 | lm loss: 7.916237E+00 | loss scale: 8192.0 | grad norm: 36720.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 509/ 159576 | consumed samples: 8144 | elapsed time per iteration (ms): 13929.6 | learning rate: 2.259E-06 | global batch size: 16 | lm loss: 7.943053E+00 | loss scale: 8192.0 | grad norm: 38847.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 510/ 159576 | consumed samples: 8160 | elapsed time per iteration (ms): 13830.3 | learning rate: 2.263E-06 | global batch size: 16 | lm loss: 7.853089E+00 | loss scale: 8192.0 | grad norm: 37581.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 511/ 159576 | consumed samples: 8176 | elapsed time per iteration (ms): 13826.8 | learning rate: 2.268E-06 | global batch size: 16 | lm loss: 7.664119E+00 | loss scale: 8192.0 | grad norm: 34046.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 512/ 159576 | consumed samples: 8192 | elapsed time per iteration (ms): 14623.1 | learning rate: 2.272E-06 | global batch size: 16 | lm loss: 7.786874E+00 | loss scale: 8192.0 | grad norm: 28303.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 513/ 159576 | consumed samples: 8208 | elapsed time per iteration (ms): 13633.3 | learning rate: 2.277E-06 | global batch size: 16 | lm loss: 7.763934E+00 | loss scale: 8192.0 | grad norm: 32905.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 514/ 159576 | consumed samples: 8224 | elapsed time per iteration (ms): 13562.5 | learning rate: 2.281E-06 | global batch size: 16 | lm loss: 7.825607E+00 | loss scale: 8192.0 | grad norm: 32400.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 515/ 159576 | consumed samples: 8240 | elapsed time per iteration (ms): 13893.1 | learning rate: 2.286E-06 | global batch size: 16 | lm loss: 7.780645E+00 | loss scale: 8192.0 | grad norm: 39597.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 516/ 159576 | consumed samples: 8256 | elapsed time per iteration (ms): 13943.0 | learning rate: 2.290E-06 | global batch size: 16 | lm loss: 7.949652E+00 | loss scale: 8192.0 | grad norm: 29624.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 517/ 159576 | consumed samples: 8272 | elapsed time per iteration (ms): 13457.2 | learning rate: 2.294E-06 | global batch size: 16 | lm loss: 7.840482E+00 | loss scale: 8192.0 | grad norm: 34709.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 04:13:42] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1162855_[1-10%1] on 'gpu_p13' partition) [2021-09-24 04:13:42] PULSE: tr8-104B is running for 12:37 since 2021-09-24T04:01:05 (1162747 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 518/ 159576 | consumed samples: 8288 | elapsed time per iteration (ms): 13506.3 | learning rate: 2.299E-06 | global batch size: 16 | lm loss: 7.914812E+00 | loss scale: 8192.0 | grad norm: 24295.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 519/ 159576 | consumed samples: 8304 | elapsed time per iteration (ms): 14169.8 | learning rate: 2.303E-06 | global batch size: 16 | lm loss: 7.710842E+00 | loss scale: 8192.0 | grad norm: 32528.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 520/ 159576 | consumed samples: 8320 | elapsed time per iteration (ms): 13829.9 | learning rate: 2.308E-06 | global batch size: 16 | lm loss: 7.806552E+00 | loss scale: 8192.0 | grad norm: 37677.096 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 521/ 159576 | consumed samples: 8336 | elapsed time per iteration (ms): 13564.6 | learning rate: 2.312E-06 | global batch size: 16 | lm loss: 7.817222E+00 | loss scale: 8192.0 | grad norm: 30827.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 522/ 159576 | consumed samples: 8352 | elapsed time per iteration (ms): 13848.1 | learning rate: 2.317E-06 | global batch size: 16 | lm loss: 7.805755E+00 | loss scale: 8192.0 | grad norm: 31599.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 523/ 159576 | consumed samples: 8368 | elapsed time per iteration (ms): 13893.6 | learning rate: 2.321E-06 | global batch size: 16 | lm loss: 7.845006E+00 | loss scale: 8192.0 | grad norm: 34359.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 524/ 159576 | consumed samples: 8384 | elapsed time per iteration (ms): 13874.2 | learning rate: 2.325E-06 | global batch size: 16 | lm loss: 7.806132E+00 | loss scale: 8192.0 | grad norm: 34509.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 525/ 159576 | consumed samples: 8400 | elapsed time per iteration (ms): 14357.0 | learning rate: 2.330E-06 | global batch size: 16 | lm loss: 7.713592E+00 | loss scale: 8192.0 | grad norm: 36961.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 526/ 159576 | consumed samples: 8416 | elapsed time per iteration (ms): 14049.5 | learning rate: 2.334E-06 | global batch size: 16 | lm loss: 7.744096E+00 | loss scale: 8192.0 | grad norm: 46754.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 527/ 159576 | consumed samples: 8432 | elapsed time per iteration (ms): 14142.6 | learning rate: 2.339E-06 | global batch size: 16 | lm loss: 7.798402E+00 | loss scale: 8192.0 | grad norm: 38396.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 528/ 159576 | consumed samples: 8448 | elapsed time per iteration (ms): 13474.9 | learning rate: 2.343E-06 | global batch size: 16 | lm loss: 7.987565E+00 | loss scale: 8192.0 | grad norm: 36935.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 529/ 159576 | consumed samples: 8464 | elapsed time per iteration (ms): 14180.8 | learning rate: 2.348E-06 | global batch size: 16 | lm loss: 7.766053E+00 | loss scale: 8192.0 | grad norm: 35413.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 530/ 159576 | consumed samples: 8480 | elapsed time per iteration (ms): 13844.6 | learning rate: 2.352E-06 | global batch size: 16 | lm loss: 7.906172E+00 | loss scale: 8192.0 | grad norm: 26808.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 531/ 159576 | consumed samples: 8496 | elapsed time per iteration (ms): 13786.0 | learning rate: 2.357E-06 | global batch size: 16 | lm loss: 7.840616E+00 | loss scale: 8192.0 | grad norm: 38477.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 532/ 159576 | consumed samples: 8512 | elapsed time per iteration (ms): 13935.0 | learning rate: 2.361E-06 | global batch size: 16 | lm loss: 7.367872E+00 | loss scale: 8192.0 | grad norm: 51156.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 533/ 159576 | consumed samples: 8528 | elapsed time per iteration (ms): 14022.6 | learning rate: 2.365E-06 | global batch size: 16 | lm loss: 7.941976E+00 | loss scale: 8192.0 | grad norm: 46439.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 534/ 159576 | consumed samples: 8544 | elapsed time per iteration (ms): 14296.7 | learning rate: 2.370E-06 | global batch size: 16 | lm loss: 7.869607E+00 | loss scale: 8192.0 | grad norm: 29876.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 535/ 159576 | consumed samples: 8560 | elapsed time per iteration (ms): 13470.0 | learning rate: 2.374E-06 | global batch size: 16 | lm loss: 7.635067E+00 | loss scale: 8192.0 | grad norm: 34076.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 536/ 159576 | consumed samples: 8576 | elapsed time per iteration (ms): 13796.1 | learning rate: 2.379E-06 | global batch size: 16 | lm loss: 7.842813E+00 | loss scale: 8192.0 | grad norm: 41800.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 537/ 159576 | consumed samples: 8592 | elapsed time per iteration (ms): 13818.0 | learning rate: 2.383E-06 | global batch size: 16 | lm loss: 7.984433E+00 | loss scale: 8192.0 | grad norm: 38203.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 538/ 159576 | consumed samples: 8608 | elapsed time per iteration (ms): 14109.2 | learning rate: 2.388E-06 | global batch size: 16 | lm loss: 7.724606E+00 | loss scale: 8192.0 | grad norm: 44792.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 539/ 159576 | consumed samples: 8624 | elapsed time per iteration (ms): 13906.3 | learning rate: 2.392E-06 | global batch size: 16 | lm loss: 7.800515E+00 | loss scale: 8192.0 | grad norm: 32297.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 540/ 159576 | consumed samples: 8640 | elapsed time per iteration (ms): 14143.5 | learning rate: 2.396E-06 | global batch size: 16 | lm loss: 7.871832E+00 | loss scale: 8192.0 | grad norm: 43120.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 541/ 159576 | consumed samples: 8656 | elapsed time per iteration (ms): 14084.0 | learning rate: 2.401E-06 | global batch size: 16 | lm loss: 7.872537E+00 | loss scale: 8192.0 | grad norm: 36867.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 542/ 159576 | consumed samples: 8672 | elapsed time per iteration (ms): 13874.8 | learning rate: 2.405E-06 | global batch size: 16 | lm loss: 7.777860E+00 | loss scale: 8192.0 | grad norm: 43001.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 543/ 159576 | consumed samples: 8688 | elapsed time per iteration (ms): 13779.4 | learning rate: 2.410E-06 | global batch size: 16 | lm loss: 7.682357E+00 | loss scale: 8192.0 | grad norm: 57139.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 544/ 159576 | consumed samples: 8704 | elapsed time per iteration (ms): 14017.8 | learning rate: 2.414E-06 | global batch size: 16 | lm loss: 7.819186E+00 | loss scale: 8192.0 | grad norm: 29983.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 545/ 159576 | consumed samples: 8720 | elapsed time per iteration (ms): 13847.0 | learning rate: 2.419E-06 | global batch size: 16 | lm loss: 7.843667E+00 | loss scale: 8192.0 | grad norm: 66015.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 546/ 159576 | consumed samples: 8736 | elapsed time per iteration (ms): 13982.1 | learning rate: 2.423E-06 | global batch size: 16 | lm loss: 7.894298E+00 | loss scale: 8192.0 | grad norm: 51768.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 547/ 159576 | consumed samples: 8752 | elapsed time per iteration (ms): 14302.0 | learning rate: 2.428E-06 | global batch size: 16 | lm loss: 7.715273E+00 | loss scale: 8192.0 | grad norm: 39105.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 548/ 159576 | consumed samples: 8768 | elapsed time per iteration (ms): 14035.0 | learning rate: 2.432E-06 | global batch size: 16 | lm loss: 7.707379E+00 | loss scale: 8192.0 | grad norm: 39549.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 549/ 159576 | consumed samples: 8784 | elapsed time per iteration (ms): 13590.6 | learning rate: 2.436E-06 | global batch size: 16 | lm loss: 7.786090E+00 | loss scale: 8192.0 | grad norm: 29894.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 550/ 159576 | consumed samples: 8800 | elapsed time per iteration (ms): 13742.1 | learning rate: 2.441E-06 | global batch size: 16 | lm loss: 7.726188E+00 | loss scale: 8192.0 | grad norm: 34821.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 551/ 159576 | consumed samples: 8816 | elapsed time per iteration (ms): 13975.5 | learning rate: 2.445E-06 | global batch size: 16 | lm loss: 7.823754E+00 | loss scale: 8192.0 | grad norm: 41726.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 552/ 159576 | consumed samples: 8832 | elapsed time per iteration (ms): 13862.7 | learning rate: 2.450E-06 | global batch size: 16 | lm loss: 7.780801E+00 | loss scale: 8192.0 | grad norm: 39107.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 553/ 159576 | consumed samples: 8848 | elapsed time per iteration (ms): 13828.8 | learning rate: 2.454E-06 | global batch size: 16 | lm loss: 7.722218E+00 | loss scale: 8192.0 | grad norm: 34436.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 554/ 159576 | consumed samples: 8864 | elapsed time per iteration (ms): 14180.4 | learning rate: 2.459E-06 | global batch size: 16 | lm loss: 7.731545E+00 | loss scale: 8192.0 | grad norm: 26819.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 555/ 159576 | consumed samples: 8880 | elapsed time per iteration (ms): 14282.2 | learning rate: 2.463E-06 | global batch size: 16 | lm loss: 7.705241E+00 | loss scale: 8192.0 | grad norm: 49659.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 556/ 159576 | consumed samples: 8896 | elapsed time per iteration (ms): 13646.8 | learning rate: 2.467E-06 | global batch size: 16 | lm loss: 8.003874E+00 | loss scale: 8192.0 | grad norm: 37645.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 557/ 159576 | consumed samples: 8912 | elapsed time per iteration (ms): 13958.8 | learning rate: 2.472E-06 | global batch size: 16 | lm loss: 7.782984E+00 | loss scale: 8192.0 | grad norm: 61655.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 558/ 159576 | consumed samples: 8928 | elapsed time per iteration (ms): 13955.4 | learning rate: 2.476E-06 | global batch size: 16 | lm loss: 7.811559E+00 | loss scale: 8192.0 | grad norm: 48428.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 559/ 159576 | consumed samples: 8944 | elapsed time per iteration (ms): 14457.4 | learning rate: 2.481E-06 | global batch size: 16 | lm loss: 7.931767E+00 | loss scale: 8192.0 | grad norm: 38443.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 560/ 159576 | consumed samples: 8960 | elapsed time per iteration (ms): 13823.4 | learning rate: 2.485E-06 | global batch size: 16 | lm loss: 7.793911E+00 | loss scale: 8192.0 | grad norm: 40207.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 561/ 159576 | consumed samples: 8976 | elapsed time per iteration (ms): 13982.4 | learning rate: 2.490E-06 | global batch size: 16 | lm loss: 7.842747E+00 | loss scale: 8192.0 | grad norm: 36711.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 562/ 159576 | consumed samples: 8992 | elapsed time per iteration (ms): 14372.1 | learning rate: 2.494E-06 | global batch size: 16 | lm loss: 7.878882E+00 | loss scale: 8192.0 | grad norm: 54306.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 563/ 159576 | consumed samples: 9008 | elapsed time per iteration (ms): 13678.7 | learning rate: 2.499E-06 | global batch size: 16 | lm loss: 7.849220E+00 | loss scale: 8192.0 | grad norm: 37543.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 564/ 159576 | consumed samples: 9024 | elapsed time per iteration (ms): 14069.8 | learning rate: 2.503E-06 | global batch size: 16 | lm loss: 7.844311E+00 | loss scale: 8192.0 | grad norm: 44716.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 565/ 159576 | consumed samples: 9040 | elapsed time per iteration (ms): 13957.6 | learning rate: 2.507E-06 | global batch size: 16 | lm loss: 7.913968E+00 | loss scale: 8192.0 | grad norm: 47566.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 566/ 159576 | consumed samples: 9056 | elapsed time per iteration (ms): 14044.6 | learning rate: 2.512E-06 | global batch size: 16 | lm loss: 7.683057E+00 | loss scale: 8192.0 | grad norm: 46568.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 567/ 159576 | consumed samples: 9072 | elapsed time per iteration (ms): 13881.5 | learning rate: 2.516E-06 | global batch size: 16 | lm loss: 7.870160E+00 | loss scale: 8192.0 | grad norm: 41402.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 568/ 159576 | consumed samples: 9088 | elapsed time per iteration (ms): 14311.0 | learning rate: 2.521E-06 | global batch size: 16 | lm loss: 7.629350E+00 | loss scale: 8192.0 | grad norm: 39843.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 569/ 159576 | consumed samples: 9104 | elapsed time per iteration (ms): 14124.8 | learning rate: 2.525E-06 | global batch size: 16 | lm loss: 7.845489E+00 | loss scale: 8192.0 | grad norm: 47458.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 570/ 159576 | consumed samples: 9120 | elapsed time per iteration (ms): 13702.3 | learning rate: 2.530E-06 | global batch size: 16 | lm loss: 7.848298E+00 | loss scale: 8192.0 | grad norm: 53032.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 571/ 159576 | consumed samples: 9136 | elapsed time per iteration (ms): 13866.4 | learning rate: 2.534E-06 | global batch size: 16 | lm loss: 7.659620E+00 | loss scale: 8192.0 | grad norm: 37376.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 572/ 159576 | consumed samples: 9152 | elapsed time per iteration (ms): 14443.8 | learning rate: 2.538E-06 | global batch size: 16 | lm loss: 7.711428E+00 | loss scale: 8192.0 | grad norm: 36846.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 573/ 159576 | consumed samples: 9168 | elapsed time per iteration (ms): 13723.1 | learning rate: 2.543E-06 | global batch size: 16 | lm loss: 7.800463E+00 | loss scale: 8192.0 | grad norm: 40022.109 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 574/ 159576 | consumed samples: 9184 | elapsed time per iteration (ms): 13313.2 | learning rate: 2.547E-06 | global batch size: 16 | lm loss: 7.722570E+00 | loss scale: 8192.0 | grad norm: 57675.937 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 575/ 159576 | consumed samples: 9200 | elapsed time per iteration (ms): 13533.3 | learning rate: 2.552E-06 | global batch size: 16 | lm loss: 7.797169E+00 | loss scale: 8192.0 | grad norm: 44067.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 576/ 159576 | consumed samples: 9216 | elapsed time per iteration (ms): 13750.6 | learning rate: 2.556E-06 | global batch size: 16 | lm loss: 7.624088E+00 | loss scale: 8192.0 | grad norm: 37579.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 577/ 159576 | consumed samples: 9232 | elapsed time per iteration (ms): 14117.7 | learning rate: 2.561E-06 | global batch size: 16 | lm loss: 7.644238E+00 | loss scale: 8192.0 | grad norm: 57135.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 578/ 159576 | consumed samples: 9248 | elapsed time per iteration (ms): 13229.4 | learning rate: 2.565E-06 | global batch size: 16 | lm loss: 7.769429E+00 | loss scale: 8192.0 | grad norm: 45266.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 579/ 159576 | consumed samples: 9264 | elapsed time per iteration (ms): 13610.6 | learning rate: 2.570E-06 | global batch size: 16 | lm loss: 7.508770E+00 | loss scale: 8192.0 | grad norm: 35604.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 580/ 159576 | consumed samples: 9280 | elapsed time per iteration (ms): 13468.6 | learning rate: 2.574E-06 | global batch size: 16 | lm loss: 7.727168E+00 | loss scale: 8192.0 | grad norm: 37920.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 581/ 159576 | consumed samples: 9296 | elapsed time per iteration (ms): 14350.0 | learning rate: 2.578E-06 | global batch size: 16 | lm loss: 7.883451E+00 | loss scale: 8192.0 | grad norm: 46515.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 582/ 159576 | consumed samples: 9312 | elapsed time per iteration (ms): 13963.5 | learning rate: 2.583E-06 | global batch size: 16 | lm loss: 7.781512E+00 | loss scale: 8192.0 | grad norm: 50170.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 583/ 159576 | consumed samples: 9328 | elapsed time per iteration (ms): 13557.9 | learning rate: 2.587E-06 | global batch size: 16 | lm loss: 7.964473E+00 | loss scale: 8192.0 | grad norm: 29593.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 584/ 159576 | consumed samples: 9344 | elapsed time per iteration (ms): 13684.8 | learning rate: 2.592E-06 | global batch size: 16 | lm loss: 7.855813E+00 | loss scale: 8192.0 | grad norm: 39619.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 585/ 159576 | consumed samples: 9360 | elapsed time per iteration (ms): 13900.2 | learning rate: 2.596E-06 | global batch size: 16 | lm loss: 7.877661E+00 | loss scale: 8192.0 | grad norm: 31203.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 586/ 159576 | consumed samples: 9376 | elapsed time per iteration (ms): 13512.1 | learning rate: 2.601E-06 | global batch size: 16 | lm loss: 7.887114E+00 | loss scale: 8192.0 | grad norm: 63261.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 587/ 159576 | consumed samples: 9392 | elapsed time per iteration (ms): 13501.8 | learning rate: 2.605E-06 | global batch size: 16 | lm loss: 7.815706E+00 | loss scale: 8192.0 | grad norm: 47655.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 588/ 159576 | consumed samples: 9408 | elapsed time per iteration (ms): 13350.5 | learning rate: 2.609E-06 | global batch size: 16 | lm loss: 7.754656E+00 | loss scale: 8192.0 | grad norm: 49073.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 589/ 159576 | consumed samples: 9424 | elapsed time per iteration (ms): 13532.4 | learning rate: 2.614E-06 | global batch size: 16 | lm loss: 7.622519E+00 | loss scale: 8192.0 | grad norm: 39015.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 590/ 159576 | consumed samples: 9440 | elapsed time per iteration (ms): 13725.1 | learning rate: 2.618E-06 | global batch size: 16 | lm loss: 7.841989E+00 | loss scale: 8192.0 | grad norm: 59373.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 591/ 159576 | consumed samples: 9456 | elapsed time per iteration (ms): 13818.0 | learning rate: 2.623E-06 | global batch size: 16 | lm loss: 7.730304E+00 | loss scale: 8192.0 | grad norm: 56512.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 592/ 159576 | consumed samples: 9472 | elapsed time per iteration (ms): 13289.0 | learning rate: 2.627E-06 | global batch size: 16 | lm loss: 7.849043E+00 | loss scale: 8192.0 | grad norm: 44031.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 593/ 159576 | consumed samples: 9488 | elapsed time per iteration (ms): 13614.5 | learning rate: 2.632E-06 | global batch size: 16 | lm loss: 7.807899E+00 | loss scale: 8192.0 | grad norm: 43332.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 594/ 159576 | consumed samples: 9504 | elapsed time per iteration (ms): 14163.8 | learning rate: 2.636E-06 | global batch size: 16 | lm loss: 7.765454E+00 | loss scale: 8192.0 | grad norm: 57221.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 595/ 159576 | consumed samples: 9520 | elapsed time per iteration (ms): 13156.1 | learning rate: 2.641E-06 | global batch size: 16 | lm loss: 7.647946E+00 | loss scale: 8192.0 | grad norm: 61799.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 596/ 159576 | consumed samples: 9536 | elapsed time per iteration (ms): 13612.4 | learning rate: 2.645E-06 | global batch size: 16 | lm loss: 7.788985E+00 | loss scale: 8192.0 | grad norm: 47569.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 597/ 159576 | consumed samples: 9552 | elapsed time per iteration (ms): 13614.3 | learning rate: 2.649E-06 | global batch size: 16 | lm loss: 7.796825E+00 | loss scale: 8192.0 | grad norm: 34793.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 598/ 159576 | consumed samples: 9568 | elapsed time per iteration (ms): 13701.2 | learning rate: 2.654E-06 | global batch size: 16 | lm loss: 7.797745E+00 | loss scale: 8192.0 | grad norm: 78279.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 599/ 159576 | consumed samples: 9584 | elapsed time per iteration (ms): 13638.2 | learning rate: 2.658E-06 | global batch size: 16 | lm loss: 7.724266E+00 | loss scale: 8192.0 | grad norm: 52804.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 600/ 159576 | consumed samples: 9600 | elapsed time per iteration (ms): 13579.9 | learning rate: 2.663E-06 | global batch size: 16 | lm loss: 7.820310E+00 | loss scale: 8192.0 | grad norm: 37266.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 601/ 159576 | consumed samples: 9616 | elapsed time per iteration (ms): 13865.9 | learning rate: 2.667E-06 | global batch size: 16 | lm loss: 7.770097E+00 | loss scale: 8192.0 | grad norm: 35207.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 602/ 159576 | consumed samples: 9632 | elapsed time per iteration (ms): 13180.7 | learning rate: 2.672E-06 | global batch size: 16 | lm loss: 7.816167E+00 | loss scale: 8192.0 | grad norm: 38744.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 603/ 159576 | consumed samples: 9648 | elapsed time per iteration (ms): 13931.1 | learning rate: 2.676E-06 | global batch size: 16 | lm loss: 7.817324E+00 | loss scale: 8192.0 | grad norm: 36573.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 604/ 159576 | consumed samples: 9664 | elapsed time per iteration (ms): 13626.6 | learning rate: 2.680E-06 | global batch size: 16 | lm loss: 7.730925E+00 | loss scale: 8192.0 | grad norm: 34465.028 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 605/ 159576 | consumed samples: 9680 | elapsed time per iteration (ms): 13615.1 | learning rate: 2.685E-06 | global batch size: 16 | lm loss: 7.862791E+00 | loss scale: 8192.0 | grad norm: 36177.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 606/ 159576 | consumed samples: 9696 | elapsed time per iteration (ms): 13496.6 | learning rate: 2.689E-06 | global batch size: 16 | lm loss: 7.773019E+00 | loss scale: 8192.0 | grad norm: 41679.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 607/ 159576 | consumed samples: 9712 | elapsed time per iteration (ms): 14055.9 | learning rate: 2.694E-06 | global batch size: 16 | lm loss: 7.785677E+00 | loss scale: 8192.0 | grad norm: 37271.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 608/ 159576 | consumed samples: 9728 | elapsed time per iteration (ms): 13879.6 | learning rate: 2.698E-06 | global batch size: 16 | lm loss: 7.825086E+00 | loss scale: 8192.0 | grad norm: 47809.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 609/ 159576 | consumed samples: 9744 | elapsed time per iteration (ms): 13552.3 | learning rate: 2.703E-06 | global batch size: 16 | lm loss: 7.740236E+00 | loss scale: 8192.0 | grad norm: 52434.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 610/ 159576 | consumed samples: 9760 | elapsed time per iteration (ms): 13176.0 | learning rate: 2.707E-06 | global batch size: 16 | lm loss: 7.737531E+00 | loss scale: 8192.0 | grad norm: 48525.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 611/ 159576 | consumed samples: 9776 | elapsed time per iteration (ms): 13593.3 | learning rate: 2.712E-06 | global batch size: 16 | lm loss: 7.592016E+00 | loss scale: 8192.0 | grad norm: 43005.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 612/ 159576 | consumed samples: 9792 | elapsed time per iteration (ms): 13859.6 | learning rate: 2.716E-06 | global batch size: 16 | lm loss: 7.717112E+00 | loss scale: 8192.0 | grad norm: 39297.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 613/ 159576 | consumed samples: 9808 | elapsed time per iteration (ms): 13457.1 | learning rate: 2.720E-06 | global batch size: 16 | lm loss: 7.876259E+00 | loss scale: 8192.0 | grad norm: 46784.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 614/ 159576 | consumed samples: 9824 | elapsed time per iteration (ms): 13891.1 | learning rate: 2.725E-06 | global batch size: 16 | lm loss: 7.783233E+00 | loss scale: 8192.0 | grad norm: 55950.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 615/ 159576 | consumed samples: 9840 | elapsed time per iteration (ms): 13986.9 | learning rate: 2.729E-06 | global batch size: 16 | lm loss: 7.671467E+00 | loss scale: 8192.0 | grad norm: 37634.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 616/ 159576 | consumed samples: 9856 | elapsed time per iteration (ms): 14382.5 | learning rate: 2.734E-06 | global batch size: 16 | lm loss: 7.716076E+00 | loss scale: 8192.0 | grad norm: 39465.766 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 617/ 159576 | consumed samples: 9872 | elapsed time per iteration (ms): 13446.9 | learning rate: 2.738E-06 | global batch size: 16 | lm loss: 7.701165E+00 | loss scale: 8192.0 | grad norm: 33600.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 618/ 159576 | consumed samples: 9888 | elapsed time per iteration (ms): 13921.0 | learning rate: 2.743E-06 | global batch size: 16 | lm loss: 7.846385E+00 | loss scale: 8192.0 | grad norm: 34178.825 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 619/ 159576 | consumed samples: 9904 | elapsed time per iteration (ms): 13866.6 | learning rate: 2.747E-06 | global batch size: 16 | lm loss: 7.788978E+00 | loss scale: 8192.0 | grad norm: 39840.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 620/ 159576 | consumed samples: 9920 | elapsed time per iteration (ms): 14194.3 | learning rate: 2.751E-06 | global batch size: 16 | lm loss: 7.718859E+00 | loss scale: 8192.0 | grad norm: 35668.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 621/ 159576 | consumed samples: 9936 | elapsed time per iteration (ms): 14052.1 | learning rate: 2.756E-06 | global batch size: 16 | lm loss: 7.815299E+00 | loss scale: 8192.0 | grad norm: 65082.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 622/ 159576 | consumed samples: 9952 | elapsed time per iteration (ms): 13986.4 | learning rate: 2.760E-06 | global batch size: 16 | lm loss: 7.647432E+00 | loss scale: 8192.0 | grad norm: 30577.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 623/ 159576 | consumed samples: 9968 | elapsed time per iteration (ms): 14070.1 | learning rate: 2.765E-06 | global batch size: 16 | lm loss: 7.470105E+00 | loss scale: 8192.0 | grad norm: 49150.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 624/ 159576 | consumed samples: 9984 | elapsed time per iteration (ms): 13591.8 | learning rate: 2.769E-06 | global batch size: 16 | lm loss: 7.751683E+00 | loss scale: 8192.0 | grad norm: 37773.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 625/ 159576 | consumed samples: 10000 | elapsed time per iteration (ms): 14109.1 | learning rate: 2.774E-06 | global batch size: 16 | lm loss: 7.850559E+00 | loss scale: 8192.0 | grad norm: 49716.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 626/ 159576 | consumed samples: 10016 | elapsed time per iteration (ms): 13883.7 | learning rate: 2.778E-06 | global batch size: 16 | lm loss: 7.761450E+00 | loss scale: 8192.0 | grad norm: 40472.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 627/ 159576 | consumed samples: 10032 | elapsed time per iteration (ms): 13871.1 | learning rate: 2.783E-06 | global batch size: 16 | lm loss: 7.638558E+00 | loss scale: 8192.0 | grad norm: 32194.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 628/ 159576 | consumed samples: 10048 | elapsed time per iteration (ms): 14009.2 | learning rate: 2.787E-06 | global batch size: 16 | lm loss: 7.602344E+00 | loss scale: 8192.0 | grad norm: 48067.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 629/ 159576 | consumed samples: 10064 | elapsed time per iteration (ms): 14668.1 | learning rate: 2.791E-06 | global batch size: 16 | lm loss: 7.641259E+00 | loss scale: 8192.0 | grad norm: 36222.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 630/ 159576 | consumed samples: 10080 | elapsed time per iteration (ms): 13862.3 | learning rate: 2.796E-06 | global batch size: 16 | lm loss: 7.665779E+00 | loss scale: 8192.0 | grad norm: 42515.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 631/ 159576 | consumed samples: 10096 | elapsed time per iteration (ms): 13588.5 | learning rate: 2.800E-06 | global batch size: 16 | lm loss: 7.754525E+00 | loss scale: 8192.0 | grad norm: 49054.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 632/ 159576 | consumed samples: 10112 | elapsed time per iteration (ms): 13844.9 | learning rate: 2.805E-06 | global batch size: 16 | lm loss: 7.774928E+00 | loss scale: 8192.0 | grad norm: 45662.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 633/ 159576 | consumed samples: 10128 | elapsed time per iteration (ms): 14341.8 | learning rate: 2.809E-06 | global batch size: 16 | lm loss: 7.554594E+00 | loss scale: 8192.0 | grad norm: 60744.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 634/ 159576 | consumed samples: 10144 | elapsed time per iteration (ms): 13746.1 | learning rate: 2.814E-06 | global batch size: 16 | lm loss: 7.637143E+00 | loss scale: 8192.0 | grad norm: 49330.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 635/ 159576 | consumed samples: 10160 | elapsed time per iteration (ms): 13894.5 | learning rate: 2.818E-06 | global batch size: 16 | lm loss: 7.983640E+00 | loss scale: 8192.0 | grad norm: 49417.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 636/ 159576 | consumed samples: 10176 | elapsed time per iteration (ms): 14194.7 | learning rate: 2.822E-06 | global batch size: 16 | lm loss: 7.681066E+00 | loss scale: 8192.0 | grad norm: 61468.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 637/ 159576 | consumed samples: 10192 | elapsed time per iteration (ms): 13961.2 | learning rate: 2.827E-06 | global batch size: 16 | lm loss: 7.862648E+00 | loss scale: 8192.0 | grad norm: 72192.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 638/ 159576 | consumed samples: 10208 | elapsed time per iteration (ms): 13647.5 | learning rate: 2.831E-06 | global batch size: 16 | lm loss: 7.569575E+00 | loss scale: 8192.0 | grad norm: 45669.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 639/ 159576 | consumed samples: 10224 | elapsed time per iteration (ms): 13856.5 | learning rate: 2.836E-06 | global batch size: 16 | lm loss: 7.844266E+00 | loss scale: 8192.0 | grad norm: 36677.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 640/ 159576 | consumed samples: 10240 | elapsed time per iteration (ms): 14073.9 | learning rate: 2.840E-06 | global batch size: 16 | lm loss: 7.845327E+00 | loss scale: 8192.0 | grad norm: 96907.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 641/ 159576 | consumed samples: 10256 | elapsed time per iteration (ms): 13796.2 | learning rate: 2.845E-06 | global batch size: 16 | lm loss: 7.647357E+00 | loss scale: 8192.0 | grad norm: 57700.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 642/ 159576 | consumed samples: 10272 | elapsed time per iteration (ms): 14118.9 | learning rate: 2.849E-06 | global batch size: 16 | lm loss: 7.207680E+00 | loss scale: 8192.0 | grad norm: 51064.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 643/ 159576 | consumed samples: 10288 | elapsed time per iteration (ms): 14102.7 | learning rate: 2.854E-06 | global batch size: 16 | lm loss: 7.651158E+00 | loss scale: 8192.0 | grad norm: 42382.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 644/ 159576 | consumed samples: 10304 | elapsed time per iteration (ms): 14051.2 | learning rate: 2.858E-06 | global batch size: 16 | lm loss: 7.854011E+00 | loss scale: 8192.0 | grad norm: 91247.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 645/ 159576 | consumed samples: 10320 | elapsed time per iteration (ms): 13538.9 | learning rate: 2.862E-06 | global batch size: 16 | lm loss: 7.769484E+00 | loss scale: 8192.0 | grad norm: 69652.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 646/ 159576 | consumed samples: 10336 | elapsed time per iteration (ms): 14249.0 | learning rate: 2.867E-06 | global batch size: 16 | lm loss: 7.553013E+00 | loss scale: 8192.0 | grad norm: 51636.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 647/ 159576 | consumed samples: 10352 | elapsed time per iteration (ms): 13970.2 | learning rate: 2.871E-06 | global batch size: 16 | lm loss: 8.084120E+00 | loss scale: 8192.0 | grad norm: 43277.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 648/ 159576 | consumed samples: 10368 | elapsed time per iteration (ms): 13853.5 | learning rate: 2.876E-06 | global batch size: 16 | lm loss: 7.727980E+00 | loss scale: 8192.0 | grad norm: 61582.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 649/ 159576 | consumed samples: 10384 | elapsed time per iteration (ms): 13732.7 | learning rate: 2.880E-06 | global batch size: 16 | lm loss: 8.087885E+00 | loss scale: 8192.0 | grad norm: 80675.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 650/ 159576 | consumed samples: 10400 | elapsed time per iteration (ms): 14065.0 | learning rate: 2.885E-06 | global batch size: 16 | lm loss: 7.735159E+00 | loss scale: 8192.0 | grad norm: 57826.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 651/ 159576 | consumed samples: 10416 | elapsed time per iteration (ms): 14427.2 | learning rate: 2.889E-06 | global batch size: 16 | lm loss: 7.631308E+00 | loss scale: 8192.0 | grad norm: 36267.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 652/ 159576 | consumed samples: 10432 | elapsed time per iteration (ms): 13615.7 | learning rate: 2.893E-06 | global batch size: 16 | lm loss: 7.756464E+00 | loss scale: 8192.0 | grad norm: 90673.943 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 653/ 159576 | consumed samples: 10448 | elapsed time per iteration (ms): 13935.6 | learning rate: 2.898E-06 | global batch size: 16 | lm loss: 7.687772E+00 | loss scale: 8192.0 | grad norm: 73567.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 654/ 159576 | consumed samples: 10464 | elapsed time per iteration (ms): 14106.4 | learning rate: 2.902E-06 | global batch size: 16 | lm loss: 7.805472E+00 | loss scale: 8192.0 | grad norm: 43212.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 655/ 159576 | consumed samples: 10480 | elapsed time per iteration (ms): 13870.0 | learning rate: 2.907E-06 | global batch size: 16 | lm loss: 7.733329E+00 | loss scale: 8192.0 | grad norm: 42721.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 656/ 159576 | consumed samples: 10496 | elapsed time per iteration (ms): 13912.1 | learning rate: 2.911E-06 | global batch size: 16 | lm loss: 7.764544E+00 | loss scale: 8192.0 | grad norm: 95237.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 657/ 159576 | consumed samples: 10512 | elapsed time per iteration (ms): 13959.6 | learning rate: 2.916E-06 | global batch size: 16 | lm loss: 7.873410E+00 | loss scale: 8192.0 | grad norm: 58039.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 658/ 159576 | consumed samples: 10528 | elapsed time per iteration (ms): 14236.4 | learning rate: 2.920E-06 | global batch size: 16 | lm loss: 7.776018E+00 | loss scale: 8192.0 | grad norm: 47844.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 659/ 159576 | consumed samples: 10544 | elapsed time per iteration (ms): 14055.2 | learning rate: 2.925E-06 | global batch size: 16 | lm loss: 7.913632E+00 | loss scale: 8192.0 | grad norm: 52680.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 660/ 159576 | consumed samples: 10560 | elapsed time per iteration (ms): 13952.7 | learning rate: 2.929E-06 | global batch size: 16 | lm loss: 7.682195E+00 | loss scale: 8192.0 | grad norm: 43818.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 661/ 159576 | consumed samples: 10576 | elapsed time per iteration (ms): 14150.0 | learning rate: 2.933E-06 | global batch size: 16 | lm loss: 7.787490E+00 | loss scale: 8192.0 | grad norm: 79352.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 662/ 159576 | consumed samples: 10592 | elapsed time per iteration (ms): 13865.0 | learning rate: 2.938E-06 | global batch size: 16 | lm loss: 7.774850E+00 | loss scale: 8192.0 | grad norm: 38730.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 663/ 159576 | consumed samples: 10608 | elapsed time per iteration (ms): 14161.1 | learning rate: 2.942E-06 | global batch size: 16 | lm loss: 7.580084E+00 | loss scale: 8192.0 | grad norm: 41013.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 664/ 159576 | consumed samples: 10624 | elapsed time per iteration (ms): 13917.2 | learning rate: 2.947E-06 | global batch size: 16 | lm loss: 7.885849E+00 | loss scale: 8192.0 | grad norm: 52940.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 665/ 159576 | consumed samples: 10640 | elapsed time per iteration (ms): 14187.3 | learning rate: 2.951E-06 | global batch size: 16 | lm loss: 7.708643E+00 | loss scale: 8192.0 | grad norm: 45471.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 666/ 159576 | consumed samples: 10656 | elapsed time per iteration (ms): 13816.1 | learning rate: 2.956E-06 | global batch size: 16 | lm loss: 7.852731E+00 | loss scale: 8192.0 | grad norm: 34948.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 667/ 159576 | consumed samples: 10672 | elapsed time per iteration (ms): 13998.2 | learning rate: 2.960E-06 | global batch size: 16 | lm loss: 7.783283E+00 | loss scale: 8192.0 | grad norm: 72415.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 668/ 159576 | consumed samples: 10688 | elapsed time per iteration (ms): 14355.3 | learning rate: 2.964E-06 | global batch size: 16 | lm loss: 7.606567E+00 | loss scale: 8192.0 | grad norm: 40358.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 669/ 159576 | consumed samples: 10704 | elapsed time per iteration (ms): 13737.0 | learning rate: 2.969E-06 | global batch size: 16 | lm loss: 7.726189E+00 | loss scale: 8192.0 | grad norm: 40258.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 670/ 159576 | consumed samples: 10720 | elapsed time per iteration (ms): 13793.7 | learning rate: 2.973E-06 | global batch size: 16 | lm loss: 7.691747E+00 | loss scale: 8192.0 | grad norm: 41826.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 671/ 159576 | consumed samples: 10736 | elapsed time per iteration (ms): 13990.9 | learning rate: 2.978E-06 | global batch size: 16 | lm loss: 7.731771E+00 | loss scale: 8192.0 | grad norm: 73683.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 672/ 159576 | consumed samples: 10752 | elapsed time per iteration (ms): 14342.7 | learning rate: 2.982E-06 | global batch size: 16 | lm loss: 7.751697E+00 | loss scale: 8192.0 | grad norm: 45162.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 673/ 159576 | consumed samples: 10768 | elapsed time per iteration (ms): 14019.6 | learning rate: 2.987E-06 | global batch size: 16 | lm loss: 7.628830E+00 | loss scale: 8192.0 | grad norm: 50354.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 674/ 159576 | consumed samples: 10784 | elapsed time per iteration (ms): 13505.9 | learning rate: 2.991E-06 | global batch size: 16 | lm loss: 7.737679E+00 | loss scale: 8192.0 | grad norm: 42630.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 675/ 159576 | consumed samples: 10800 | elapsed time per iteration (ms): 14062.7 | learning rate: 2.996E-06 | global batch size: 16 | lm loss: 7.697219E+00 | loss scale: 8192.0 | grad norm: 74141.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 676/ 159576 | consumed samples: 10816 | elapsed time per iteration (ms): 14348.9 | learning rate: 3.000E-06 | global batch size: 16 | lm loss: 7.685856E+00 | loss scale: 8192.0 | grad norm: 42229.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 677/ 159576 | consumed samples: 10832 | elapsed time per iteration (ms): 13490.6 | learning rate: 3.004E-06 | global batch size: 16 | lm loss: 7.675433E+00 | loss scale: 8192.0 | grad norm: 41266.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 678/ 159576 | consumed samples: 10848 | elapsed time per iteration (ms): 13864.0 | learning rate: 3.009E-06 | global batch size: 16 | lm loss: 7.602362E+00 | loss scale: 8192.0 | grad norm: 28128.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 679/ 159576 | consumed samples: 10864 | elapsed time per iteration (ms): 13876.8 | learning rate: 3.013E-06 | global batch size: 16 | lm loss: 7.921748E+00 | loss scale: 8192.0 | grad norm: 94093.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 680/ 159576 | consumed samples: 10880 | elapsed time per iteration (ms): 14089.6 | learning rate: 3.018E-06 | global batch size: 16 | lm loss: 7.932827E+00 | loss scale: 8192.0 | grad norm: 66492.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 681/ 159576 | consumed samples: 10896 | elapsed time per iteration (ms): 13869.3 | learning rate: 3.022E-06 | global batch size: 16 | lm loss: 7.712299E+00 | loss scale: 8192.0 | grad norm: 48293.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 682/ 159576 | consumed samples: 10912 | elapsed time per iteration (ms): 14135.1 | learning rate: 3.027E-06 | global batch size: 16 | lm loss: 7.638190E+00 | loss scale: 8192.0 | grad norm: 38847.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 683/ 159576 | consumed samples: 10928 | elapsed time per iteration (ms): 13923.5 | learning rate: 3.031E-06 | global batch size: 16 | lm loss: 7.728378E+00 | loss scale: 8192.0 | grad norm: 145094.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 684/ 159576 | consumed samples: 10944 | elapsed time per iteration (ms): 13370.2 | learning rate: 3.036E-06 | global batch size: 16 | lm loss: 7.695971E+00 | loss scale: 8192.0 | grad norm: 72337.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 685/ 159576 | consumed samples: 10960 | elapsed time per iteration (ms): 14077.4 | learning rate: 3.040E-06 | global batch size: 16 | lm loss: 7.967864E+00 | loss scale: 8192.0 | grad norm: 60013.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 686/ 159576 | consumed samples: 10976 | elapsed time per iteration (ms): 13866.9 | learning rate: 3.044E-06 | global batch size: 16 | lm loss: 7.790969E+00 | loss scale: 8192.0 | grad norm: 66989.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 687/ 159576 | consumed samples: 10992 | elapsed time per iteration (ms): 13994.5 | learning rate: 3.049E-06 | global batch size: 16 | lm loss: 7.558614E+00 | loss scale: 8192.0 | grad norm: 41316.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 688/ 159576 | consumed samples: 11008 | elapsed time per iteration (ms): 13732.9 | learning rate: 3.053E-06 | global batch size: 16 | lm loss: 7.831646E+00 | loss scale: 8192.0 | grad norm: 113582.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 689/ 159576 | consumed samples: 11024 | elapsed time per iteration (ms): 14223.7 | learning rate: 3.058E-06 | global batch size: 16 | lm loss: 7.934176E+00 | loss scale: 8192.0 | grad norm: 88203.837 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 690/ 159576 | consumed samples: 11040 | elapsed time per iteration (ms): 14149.5 | learning rate: 3.062E-06 | global batch size: 16 | lm loss: 8.017797E+00 | loss scale: 8192.0 | grad norm: 58624.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 691/ 159576 | consumed samples: 11056 | elapsed time per iteration (ms): 13400.2 | learning rate: 3.067E-06 | global batch size: 16 | lm loss: 7.660833E+00 | loss scale: 8192.0 | grad norm: 55959.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 692/ 159576 | consumed samples: 11072 | elapsed time per iteration (ms): 13833.8 | learning rate: 3.071E-06 | global batch size: 16 | lm loss: 7.664068E+00 | loss scale: 8192.0 | grad norm: 59276.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 693/ 159576 | consumed samples: 11088 | elapsed time per iteration (ms): 14240.4 | learning rate: 3.075E-06 | global batch size: 16 | lm loss: 7.707018E+00 | loss scale: 8192.0 | grad norm: 93883.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 694/ 159576 | consumed samples: 11104 | elapsed time per iteration (ms): 13875.3 | learning rate: 3.080E-06 | global batch size: 16 | lm loss: 7.786274E+00 | loss scale: 8192.0 | grad norm: 64903.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 695/ 159576 | consumed samples: 11120 | elapsed time per iteration (ms): 13813.0 | learning rate: 3.084E-06 | global batch size: 16 | lm loss: 7.512930E+00 | loss scale: 8192.0 | grad norm: 51983.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 696/ 159576 | consumed samples: 11136 | elapsed time per iteration (ms): 13976.3 | learning rate: 3.089E-06 | global batch size: 16 | lm loss: 7.692935E+00 | loss scale: 8192.0 | grad norm: 60144.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 697/ 159576 | consumed samples: 11152 | elapsed time per iteration (ms): 14241.9 | learning rate: 3.093E-06 | global batch size: 16 | lm loss: 7.665162E+00 | loss scale: 8192.0 | grad norm: 45825.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 698/ 159576 | consumed samples: 11168 | elapsed time per iteration (ms): 13633.7 | learning rate: 3.098E-06 | global batch size: 16 | lm loss: 7.619460E+00 | loss scale: 8192.0 | grad norm: 50817.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 699/ 159576 | consumed samples: 11184 | elapsed time per iteration (ms): 13862.8 | learning rate: 3.102E-06 | global batch size: 16 | lm loss: 7.827911E+00 | loss scale: 8192.0 | grad norm: 55475.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 700/ 159576 | consumed samples: 11200 | elapsed time per iteration (ms): 13992.4 | learning rate: 3.107E-06 | global batch size: 16 | lm loss: 7.651889E+00 | loss scale: 8192.0 | grad norm: 41255.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 701/ 159576 | consumed samples: 11216 | elapsed time per iteration (ms): 13980.6 | learning rate: 3.111E-06 | global batch size: 16 | lm loss: 7.715150E+00 | loss scale: 8192.0 | grad norm: 54466.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 702/ 159576 | consumed samples: 11232 | elapsed time per iteration (ms): 13968.4 | learning rate: 3.115E-06 | global batch size: 16 | lm loss: 7.782993E+00 | loss scale: 8192.0 | grad norm: 52144.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 703/ 159576 | consumed samples: 11248 | elapsed time per iteration (ms): 13960.9 | learning rate: 3.120E-06 | global batch size: 16 | lm loss: 7.681329E+00 | loss scale: 8192.0 | grad norm: 51153.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 704/ 159576 | consumed samples: 11264 | elapsed time per iteration (ms): 14082.5 | learning rate: 3.124E-06 | global batch size: 16 | lm loss: 7.697348E+00 | loss scale: 8192.0 | grad norm: 30117.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 705/ 159576 | consumed samples: 11280 | elapsed time per iteration (ms): 13980.4 | learning rate: 3.129E-06 | global batch size: 16 | lm loss: 7.733425E+00 | loss scale: 8192.0 | grad norm: 49027.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 706/ 159576 | consumed samples: 11296 | elapsed time per iteration (ms): 13865.4 | learning rate: 3.133E-06 | global batch size: 16 | lm loss: 7.844088E+00 | loss scale: 8192.0 | grad norm: 43555.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 707/ 159576 | consumed samples: 11312 | elapsed time per iteration (ms): 13817.5 | learning rate: 3.138E-06 | global batch size: 16 | lm loss: 7.752273E+00 | loss scale: 8192.0 | grad norm: 96517.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 708/ 159576 | consumed samples: 11328 | elapsed time per iteration (ms): 13958.9 | learning rate: 3.142E-06 | global batch size: 16 | lm loss: 7.757376E+00 | loss scale: 8192.0 | grad norm: 77216.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 709/ 159576 | consumed samples: 11344 | elapsed time per iteration (ms): 13428.3 | learning rate: 3.146E-06 | global batch size: 16 | lm loss: 7.687693E+00 | loss scale: 8192.0 | grad norm: 57064.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 710/ 159576 | consumed samples: 11360 | elapsed time per iteration (ms): 13648.2 | learning rate: 3.151E-06 | global batch size: 16 | lm loss: 7.663705E+00 | loss scale: 8192.0 | grad norm: 50512.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 711/ 159576 | consumed samples: 11376 | elapsed time per iteration (ms): 14017.0 | learning rate: 3.155E-06 | global batch size: 16 | lm loss: 7.597622E+00 | loss scale: 8192.0 | grad norm: 52114.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 712/ 159576 | consumed samples: 11392 | elapsed time per iteration (ms): 13780.7 | learning rate: 3.160E-06 | global batch size: 16 | lm loss: 7.771480E+00 | loss scale: 8192.0 | grad norm: 169756.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 713/ 159576 | consumed samples: 11408 | elapsed time per iteration (ms): 13096.8 | learning rate: 3.164E-06 | global batch size: 16 | lm loss: 7.713109E+00 | loss scale: 8192.0 | grad norm: 87094.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 714/ 159576 | consumed samples: 11424 | elapsed time per iteration (ms): 13743.9 | learning rate: 3.169E-06 | global batch size: 16 | lm loss: 7.749861E+00 | loss scale: 8192.0 | grad norm: 49749.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 715/ 159576 | consumed samples: 11440 | elapsed time per iteration (ms): 14274.0 | learning rate: 3.173E-06 | global batch size: 16 | lm loss: 7.797529E+00 | loss scale: 8192.0 | grad norm: 51932.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 716/ 159576 | consumed samples: 11456 | elapsed time per iteration (ms): 13788.8 | learning rate: 3.178E-06 | global batch size: 16 | lm loss: 7.704132E+00 | loss scale: 8192.0 | grad norm: 68478.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 717/ 159576 | consumed samples: 11472 | elapsed time per iteration (ms): 13977.5 | learning rate: 3.182E-06 | global batch size: 16 | lm loss: 7.746219E+00 | loss scale: 8192.0 | grad norm: 107770.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 718/ 159576 | consumed samples: 11488 | elapsed time per iteration (ms): 13786.8 | learning rate: 3.186E-06 | global batch size: 16 | lm loss: 7.617724E+00 | loss scale: 8192.0 | grad norm: 57419.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 719/ 159576 | consumed samples: 11504 | elapsed time per iteration (ms): 14003.5 | learning rate: 3.191E-06 | global batch size: 16 | lm loss: 7.642632E+00 | loss scale: 8192.0 | grad norm: 48000.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 720/ 159576 | consumed samples: 11520 | elapsed time per iteration (ms): 13651.1 | learning rate: 3.195E-06 | global batch size: 16 | lm loss: 7.790938E+00 | loss scale: 8192.0 | grad norm: 45384.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 721/ 159576 | consumed samples: 11536 | elapsed time per iteration (ms): 13820.3 | learning rate: 3.200E-06 | global batch size: 16 | lm loss: 7.799318E+00 | loss scale: 8192.0 | grad norm: 94827.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 722/ 159576 | consumed samples: 11552 | elapsed time per iteration (ms): 13998.9 | learning rate: 3.204E-06 | global batch size: 16 | lm loss: 7.924202E+00 | loss scale: 8192.0 | grad norm: 106713.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 723/ 159576 | consumed samples: 11568 | elapsed time per iteration (ms): 13787.6 | learning rate: 3.209E-06 | global batch size: 16 | lm loss: 7.662113E+00 | loss scale: 8192.0 | grad norm: 53132.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 724/ 159576 | consumed samples: 11584 | elapsed time per iteration (ms): 14003.4 | learning rate: 3.213E-06 | global batch size: 16 | lm loss: 7.735355E+00 | loss scale: 8192.0 | grad norm: 46503.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 725/ 159576 | consumed samples: 11600 | elapsed time per iteration (ms): 14211.4 | learning rate: 3.217E-06 | global batch size: 16 | lm loss: 7.413515E+00 | loss scale: 8192.0 | grad norm: 46300.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 726/ 159576 | consumed samples: 11616 | elapsed time per iteration (ms): 14085.1 | learning rate: 3.222E-06 | global batch size: 16 | lm loss: 7.793005E+00 | loss scale: 8192.0 | grad norm: 123901.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 727/ 159576 | consumed samples: 11632 | elapsed time per iteration (ms): 13498.1 | learning rate: 3.226E-06 | global batch size: 16 | lm loss: 7.570110E+00 | loss scale: 8192.0 | grad norm: 110746.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 728/ 159576 | consumed samples: 11648 | elapsed time per iteration (ms): 13944.5 | learning rate: 3.231E-06 | global batch size: 16 | lm loss: 7.805285E+00 | loss scale: 8192.0 | grad norm: 54666.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 729/ 159576 | consumed samples: 11664 | elapsed time per iteration (ms): 13478.9 | learning rate: 3.235E-06 | global batch size: 16 | lm loss: 7.702326E+00 | loss scale: 8192.0 | grad norm: 95219.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 730/ 159576 | consumed samples: 11680 | elapsed time per iteration (ms): 13419.9 | learning rate: 3.240E-06 | global batch size: 16 | lm loss: 7.694516E+00 | loss scale: 8192.0 | grad norm: 44428.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 731/ 159576 | consumed samples: 11696 | elapsed time per iteration (ms): 13890.7 | learning rate: 3.244E-06 | global batch size: 16 | lm loss: 7.656667E+00 | loss scale: 8192.0 | grad norm: 79142.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 732/ 159576 | consumed samples: 11712 | elapsed time per iteration (ms): 14381.2 | learning rate: 3.249E-06 | global batch size: 16 | lm loss: 7.689932E+00 | loss scale: 8192.0 | grad norm: 69883.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 05:03:31] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1162855_[1-10%1] on 'gpu_p13' partition) [2021-09-24 05:03:31] PULSE: tr8-104B is running for 1:02:26 since 2021-09-24T04:01:05 (1162747 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 733/ 159576 | consumed samples: 11728 | elapsed time per iteration (ms): 13725.2 | learning rate: 3.253E-06 | global batch size: 16 | lm loss: 7.808900E+00 | loss scale: 8192.0 | grad norm: 50692.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 734/ 159576 | consumed samples: 11744 | elapsed time per iteration (ms): 13115.2 | learning rate: 3.257E-06 | global batch size: 16 | lm loss: 7.737029E+00 | loss scale: 8192.0 | grad norm: 69149.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 735/ 159576 | consumed samples: 11760 | elapsed time per iteration (ms): 13493.9 | learning rate: 3.262E-06 | global batch size: 16 | lm loss: 7.630354E+00 | loss scale: 8192.0 | grad norm: 85240.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 736/ 159576 | consumed samples: 11776 | elapsed time per iteration (ms): 13636.0 | learning rate: 3.266E-06 | global batch size: 16 | lm loss: 7.626644E+00 | loss scale: 8192.0 | grad norm: 57646.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 737/ 159576 | consumed samples: 11792 | elapsed time per iteration (ms): 13810.1 | learning rate: 3.271E-06 | global batch size: 16 | lm loss: 7.526936E+00 | loss scale: 8192.0 | grad norm: 95065.076 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 738/ 159576 | consumed samples: 11808 | elapsed time per iteration (ms): 13385.6 | learning rate: 3.275E-06 | global batch size: 16 | lm loss: 7.820796E+00 | loss scale: 8192.0 | grad norm: 113407.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 739/ 159576 | consumed samples: 11824 | elapsed time per iteration (ms): 13689.8 | learning rate: 3.280E-06 | global batch size: 16 | lm loss: 7.774467E+00 | loss scale: 8192.0 | grad norm: 98657.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 740/ 159576 | consumed samples: 11840 | elapsed time per iteration (ms): 13965.2 | learning rate: 3.284E-06 | global batch size: 16 | lm loss: 7.762564E+00 | loss scale: 8192.0 | grad norm: 71745.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 741/ 159576 | consumed samples: 11856 | elapsed time per iteration (ms): 13569.2 | learning rate: 3.288E-06 | global batch size: 16 | lm loss: 7.608281E+00 | loss scale: 8192.0 | grad norm: 40905.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 742/ 159576 | consumed samples: 11872 | elapsed time per iteration (ms): 13635.8 | learning rate: 3.293E-06 | global batch size: 16 | lm loss: 7.570668E+00 | loss scale: 8192.0 | grad norm: 80257.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 743/ 159576 | consumed samples: 11888 | elapsed time per iteration (ms): 13669.8 | learning rate: 3.297E-06 | global batch size: 16 | lm loss: 7.586653E+00 | loss scale: 8192.0 | grad norm: 56412.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 744/ 159576 | consumed samples: 11904 | elapsed time per iteration (ms): 13473.9 | learning rate: 3.302E-06 | global batch size: 16 | lm loss: 7.701398E+00 | loss scale: 8192.0 | grad norm: 100221.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 745/ 159576 | consumed samples: 11920 | elapsed time per iteration (ms): 13453.8 | learning rate: 3.306E-06 | global batch size: 16 | lm loss: 7.772648E+00 | loss scale: 8192.0 | grad norm: 88519.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 746/ 159576 | consumed samples: 11936 | elapsed time per iteration (ms): 13732.5 | learning rate: 3.311E-06 | global batch size: 16 | lm loss: 7.940891E+00 | loss scale: 8192.0 | grad norm: 66980.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 747/ 159576 | consumed samples: 11952 | elapsed time per iteration (ms): 13956.5 | learning rate: 3.315E-06 | global batch size: 16 | lm loss: 7.879022E+00 | loss scale: 8192.0 | grad norm: 73008.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 748/ 159576 | consumed samples: 11968 | elapsed time per iteration (ms): 13250.5 | learning rate: 3.320E-06 | global batch size: 16 | lm loss: 7.693480E+00 | loss scale: 8192.0 | grad norm: 45346.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 749/ 159576 | consumed samples: 11984 | elapsed time per iteration (ms): 13529.3 | learning rate: 3.324E-06 | global batch size: 16 | lm loss: 7.658270E+00 | loss scale: 8192.0 | grad norm: 156261.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 750/ 159576 | consumed samples: 12000 | elapsed time per iteration (ms): 14110.0 | learning rate: 3.328E-06 | global batch size: 16 | lm loss: 7.741945E+00 | loss scale: 8192.0 | grad norm: 121818.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 751/ 159576 | consumed samples: 12016 | elapsed time per iteration (ms): 13463.3 | learning rate: 3.333E-06 | global batch size: 16 | lm loss: 7.631550E+00 | loss scale: 8192.0 | grad norm: 69835.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 752/ 159576 | consumed samples: 12032 | elapsed time per iteration (ms): 13424.2 | learning rate: 3.337E-06 | global batch size: 16 | lm loss: 7.669878E+00 | loss scale: 8192.0 | grad norm: 47821.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 753/ 159576 | consumed samples: 12048 | elapsed time per iteration (ms): 13566.2 | learning rate: 3.342E-06 | global batch size: 16 | lm loss: 7.567214E+00 | loss scale: 8192.0 | grad norm: 68234.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 754/ 159576 | consumed samples: 12064 | elapsed time per iteration (ms): 14065.3 | learning rate: 3.346E-06 | global batch size: 16 | lm loss: 7.753268E+00 | loss scale: 8192.0 | grad norm: 134900.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 755/ 159576 | consumed samples: 12080 | elapsed time per iteration (ms): 13518.6 | learning rate: 3.351E-06 | global batch size: 16 | lm loss: 7.552173E+00 | loss scale: 8192.0 | grad norm: 48964.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 756/ 159576 | consumed samples: 12096 | elapsed time per iteration (ms): 13728.7 | learning rate: 3.355E-06 | global batch size: 16 | lm loss: 7.735795E+00 | loss scale: 8192.0 | grad norm: 73204.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 757/ 159576 | consumed samples: 12112 | elapsed time per iteration (ms): 14082.3 | learning rate: 3.359E-06 | global batch size: 16 | lm loss: 7.910018E+00 | loss scale: 8192.0 | grad norm: 83429.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 758/ 159576 | consumed samples: 12128 | elapsed time per iteration (ms): 13428.5 | learning rate: 3.364E-06 | global batch size: 16 | lm loss: 7.669195E+00 | loss scale: 8192.0 | grad norm: 61137.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 759/ 159576 | consumed samples: 12144 | elapsed time per iteration (ms): 13632.1 | learning rate: 3.368E-06 | global batch size: 16 | lm loss: 7.795278E+00 | loss scale: 8192.0 | grad norm: 59141.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 760/ 159576 | consumed samples: 12160 | elapsed time per iteration (ms): 13624.6 | learning rate: 3.373E-06 | global batch size: 16 | lm loss: 7.692988E+00 | loss scale: 8192.0 | grad norm: 104447.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 761/ 159576 | consumed samples: 12176 | elapsed time per iteration (ms): 13611.0 | learning rate: 3.377E-06 | global batch size: 16 | lm loss: 7.784515E+00 | loss scale: 8192.0 | grad norm: 51368.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 762/ 159576 | consumed samples: 12192 | elapsed time per iteration (ms): 13558.6 | learning rate: 3.382E-06 | global batch size: 16 | lm loss: 7.582584E+00 | loss scale: 8192.0 | grad norm: 61983.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 763/ 159576 | consumed samples: 12208 | elapsed time per iteration (ms): 13793.4 | learning rate: 3.386E-06 | global batch size: 16 | lm loss: 7.743572E+00 | loss scale: 8192.0 | grad norm: 56837.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 764/ 159576 | consumed samples: 12224 | elapsed time per iteration (ms): 13743.7 | learning rate: 3.391E-06 | global batch size: 16 | lm loss: 7.701952E+00 | loss scale: 8192.0 | grad norm: 92476.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 765/ 159576 | consumed samples: 12240 | elapsed time per iteration (ms): 13529.8 | learning rate: 3.395E-06 | global batch size: 16 | lm loss: 7.691103E+00 | loss scale: 8192.0 | grad norm: 103276.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 766/ 159576 | consumed samples: 12256 | elapsed time per iteration (ms): 13189.2 | learning rate: 3.399E-06 | global batch size: 16 | lm loss: 7.589336E+00 | loss scale: 8192.0 | grad norm: 54735.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 767/ 159576 | consumed samples: 12272 | elapsed time per iteration (ms): 13483.6 | learning rate: 3.404E-06 | global batch size: 16 | lm loss: 7.717595E+00 | loss scale: 8192.0 | grad norm: 54456.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 768/ 159576 | consumed samples: 12288 | elapsed time per iteration (ms): 13780.9 | learning rate: 3.408E-06 | global batch size: 16 | lm loss: 7.852913E+00 | loss scale: 8192.0 | grad norm: 88912.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 769/ 159576 | consumed samples: 12304 | elapsed time per iteration (ms): 13724.3 | learning rate: 3.413E-06 | global batch size: 16 | lm loss: 7.716819E+00 | loss scale: 8192.0 | grad norm: 102833.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 770/ 159576 | consumed samples: 12320 | elapsed time per iteration (ms): 13377.3 | learning rate: 3.417E-06 | global batch size: 16 | lm loss: 7.597641E+00 | loss scale: 8192.0 | grad norm: 50835.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 771/ 159576 | consumed samples: 12336 | elapsed time per iteration (ms): 13692.5 | learning rate: 3.422E-06 | global batch size: 16 | lm loss: 7.478999E+00 | loss scale: 8192.0 | grad norm: 53587.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 772/ 159576 | consumed samples: 12352 | elapsed time per iteration (ms): 14180.5 | learning rate: 3.426E-06 | global batch size: 16 | lm loss: 7.546258E+00 | loss scale: 8192.0 | grad norm: 63294.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 773/ 159576 | consumed samples: 12368 | elapsed time per iteration (ms): 13096.5 | learning rate: 3.430E-06 | global batch size: 16 | lm loss: 7.711743E+00 | loss scale: 8192.0 | grad norm: 99934.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 774/ 159576 | consumed samples: 12384 | elapsed time per iteration (ms): 13520.5 | learning rate: 3.435E-06 | global batch size: 16 | lm loss: 7.645664E+00 | loss scale: 8192.0 | grad norm: 56458.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 775/ 159576 | consumed samples: 12400 | elapsed time per iteration (ms): 13630.5 | learning rate: 3.439E-06 | global batch size: 16 | lm loss: 7.603559E+00 | loss scale: 8192.0 | grad norm: 46450.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 776/ 159576 | consumed samples: 12416 | elapsed time per iteration (ms): 14027.6 | learning rate: 3.444E-06 | global batch size: 16 | lm loss: 7.737686E+00 | loss scale: 8192.0 | grad norm: 141770.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 777/ 159576 | consumed samples: 12432 | elapsed time per iteration (ms): 13425.6 | learning rate: 3.448E-06 | global batch size: 16 | lm loss: 7.584914E+00 | loss scale: 8192.0 | grad norm: 124071.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 778/ 159576 | consumed samples: 12448 | elapsed time per iteration (ms): 13642.7 | learning rate: 3.453E-06 | global batch size: 16 | lm loss: 7.606685E+00 | loss scale: 8192.0 | grad norm: 53139.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 779/ 159576 | consumed samples: 12464 | elapsed time per iteration (ms): 13834.1 | learning rate: 3.457E-06 | global batch size: 16 | lm loss: 7.786515E+00 | loss scale: 8192.0 | grad norm: 58657.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 780/ 159576 | consumed samples: 12480 | elapsed time per iteration (ms): 13091.5 | learning rate: 3.462E-06 | global batch size: 16 | lm loss: 7.618142E+00 | loss scale: 8192.0 | grad norm: 37881.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 781/ 159576 | consumed samples: 12496 | elapsed time per iteration (ms): 14146.0 | learning rate: 3.466E-06 | global batch size: 16 | lm loss: 7.906812E+00 | loss scale: 8192.0 | grad norm: 114163.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 782/ 159576 | consumed samples: 12512 | elapsed time per iteration (ms): 14025.7 | learning rate: 3.470E-06 | global batch size: 16 | lm loss: 7.566094E+00 | loss scale: 8192.0 | grad norm: 46220.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 783/ 159576 | consumed samples: 12528 | elapsed time per iteration (ms): 13895.4 | learning rate: 3.475E-06 | global batch size: 16 | lm loss: 7.630446E+00 | loss scale: 8192.0 | grad norm: 64319.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 784/ 159576 | consumed samples: 12544 | elapsed time per iteration (ms): 13890.1 | learning rate: 3.479E-06 | global batch size: 16 | lm loss: 7.692337E+00 | loss scale: 8192.0 | grad norm: 48575.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 785/ 159576 | consumed samples: 12560 | elapsed time per iteration (ms): 14156.1 | learning rate: 3.484E-06 | global batch size: 16 | lm loss: 7.736514E+00 | loss scale: 8192.0 | grad norm: 90651.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 786/ 159576 | consumed samples: 12576 | elapsed time per iteration (ms): 14206.7 | learning rate: 3.488E-06 | global batch size: 16 | lm loss: 7.744794E+00 | loss scale: 8192.0 | grad norm: 84355.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 787/ 159576 | consumed samples: 12592 | elapsed time per iteration (ms): 13622.2 | learning rate: 3.493E-06 | global batch size: 16 | lm loss: 7.672806E+00 | loss scale: 8192.0 | grad norm: 51705.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 788/ 159576 | consumed samples: 12608 | elapsed time per iteration (ms): 13771.2 | learning rate: 3.497E-06 | global batch size: 16 | lm loss: 7.713612E+00 | loss scale: 8192.0 | grad norm: 50748.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 789/ 159576 | consumed samples: 12624 | elapsed time per iteration (ms): 14226.1 | learning rate: 3.501E-06 | global batch size: 16 | lm loss: 7.630927E+00 | loss scale: 8192.0 | grad norm: 68226.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 790/ 159576 | consumed samples: 12640 | elapsed time per iteration (ms): 14175.2 | learning rate: 3.506E-06 | global batch size: 16 | lm loss: 7.523444E+00 | loss scale: 8192.0 | grad norm: 67731.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 791/ 159576 | consumed samples: 12656 | elapsed time per iteration (ms): 13844.2 | learning rate: 3.510E-06 | global batch size: 16 | lm loss: 7.357096E+00 | loss scale: 8192.0 | grad norm: 45569.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 792/ 159576 | consumed samples: 12672 | elapsed time per iteration (ms): 13884.3 | learning rate: 3.515E-06 | global batch size: 16 | lm loss: 7.701885E+00 | loss scale: 8192.0 | grad norm: 53017.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 793/ 159576 | consumed samples: 12688 | elapsed time per iteration (ms): 14159.9 | learning rate: 3.519E-06 | global batch size: 16 | lm loss: 7.529918E+00 | loss scale: 8192.0 | grad norm: 55466.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 794/ 159576 | consumed samples: 12704 | elapsed time per iteration (ms): 13975.0 | learning rate: 3.524E-06 | global batch size: 16 | lm loss: 7.684763E+00 | loss scale: 8192.0 | grad norm: 44801.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 795/ 159576 | consumed samples: 12720 | elapsed time per iteration (ms): 13769.3 | learning rate: 3.528E-06 | global batch size: 16 | lm loss: 7.843237E+00 | loss scale: 8192.0 | grad norm: 59761.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 796/ 159576 | consumed samples: 12736 | elapsed time per iteration (ms): 13954.1 | learning rate: 3.533E-06 | global batch size: 16 | lm loss: 7.737316E+00 | loss scale: 8192.0 | grad norm: 66240.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 797/ 159576 | consumed samples: 12752 | elapsed time per iteration (ms): 13982.4 | learning rate: 3.537E-06 | global batch size: 16 | lm loss: 7.712746E+00 | loss scale: 8192.0 | grad norm: 53315.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 798/ 159576 | consumed samples: 12768 | elapsed time per iteration (ms): 14164.1 | learning rate: 3.541E-06 | global batch size: 16 | lm loss: 7.649867E+00 | loss scale: 8192.0 | grad norm: 46451.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 799/ 159576 | consumed samples: 12784 | elapsed time per iteration (ms): 14010.0 | learning rate: 3.546E-06 | global batch size: 16 | lm loss: 7.833376E+00 | loss scale: 8192.0 | grad norm: 65829.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 800/ 159576 | consumed samples: 12800 | elapsed time per iteration (ms): 14307.9 | learning rate: 3.550E-06 | global batch size: 16 | lm loss: 7.790625E+00 | loss scale: 8192.0 | grad norm: 71968.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 801/ 159576 | consumed samples: 12816 | elapsed time per iteration (ms): 13972.6 | learning rate: 3.555E-06 | global batch size: 16 | lm loss: 7.611866E+00 | loss scale: 8192.0 | grad norm: 48597.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 802/ 159576 | consumed samples: 12832 | elapsed time per iteration (ms): 13959.0 | learning rate: 3.559E-06 | global batch size: 16 | lm loss: 7.617666E+00 | loss scale: 8192.0 | grad norm: 147672.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 803/ 159576 | consumed samples: 12848 | elapsed time per iteration (ms): 13806.4 | learning rate: 3.564E-06 | global batch size: 16 | lm loss: 7.813154E+00 | loss scale: 8192.0 | grad norm: 121980.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 804/ 159576 | consumed samples: 12864 | elapsed time per iteration (ms): 13949.2 | learning rate: 3.568E-06 | global batch size: 16 | lm loss: 7.654176E+00 | loss scale: 8192.0 | grad norm: 52351.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 805/ 159576 | consumed samples: 12880 | elapsed time per iteration (ms): 13801.9 | learning rate: 3.572E-06 | global batch size: 16 | lm loss: 7.564305E+00 | loss scale: 8192.0 | grad norm: 62792.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 806/ 159576 | consumed samples: 12896 | elapsed time per iteration (ms): 13954.3 | learning rate: 3.577E-06 | global batch size: 16 | lm loss: 7.707185E+00 | loss scale: 8192.0 | grad norm: 64767.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 807/ 159576 | consumed samples: 12912 | elapsed time per iteration (ms): 14250.4 | learning rate: 3.581E-06 | global batch size: 16 | lm loss: 7.578569E+00 | loss scale: 8192.0 | grad norm: 73926.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 808/ 159576 | consumed samples: 12928 | elapsed time per iteration (ms): 14201.0 | learning rate: 3.586E-06 | global batch size: 16 | lm loss: 7.631069E+00 | loss scale: 8192.0 | grad norm: 110069.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 809/ 159576 | consumed samples: 12944 | elapsed time per iteration (ms): 13598.4 | learning rate: 3.590E-06 | global batch size: 16 | lm loss: 7.628491E+00 | loss scale: 8192.0 | grad norm: 49670.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 810/ 159576 | consumed samples: 12960 | elapsed time per iteration (ms): 13941.6 | learning rate: 3.595E-06 | global batch size: 16 | lm loss: 7.759563E+00 | loss scale: 8192.0 | grad norm: 45971.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 811/ 159576 | consumed samples: 12976 | elapsed time per iteration (ms): 14298.0 | learning rate: 3.599E-06 | global batch size: 16 | lm loss: 7.502759E+00 | loss scale: 8192.0 | grad norm: 77602.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 812/ 159576 | consumed samples: 12992 | elapsed time per iteration (ms): 13416.1 | learning rate: 3.604E-06 | global batch size: 16 | lm loss: 7.624804E+00 | loss scale: 8192.0 | grad norm: 95989.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 813/ 159576 | consumed samples: 13008 | elapsed time per iteration (ms): 13579.1 | learning rate: 3.608E-06 | global batch size: 16 | lm loss: 7.542982E+00 | loss scale: 8192.0 | grad norm: 52064.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 814/ 159576 | consumed samples: 13024 | elapsed time per iteration (ms): 14100.2 | learning rate: 3.612E-06 | global batch size: 16 | lm loss: 7.676429E+00 | loss scale: 8192.0 | grad norm: 38221.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 815/ 159576 | consumed samples: 13040 | elapsed time per iteration (ms): 14346.2 | learning rate: 3.617E-06 | global batch size: 16 | lm loss: 7.695131E+00 | loss scale: 8192.0 | grad norm: 57869.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 816/ 159576 | consumed samples: 13056 | elapsed time per iteration (ms): 13771.7 | learning rate: 3.621E-06 | global batch size: 16 | lm loss: 7.578337E+00 | loss scale: 8192.0 | grad norm: 49771.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 817/ 159576 | consumed samples: 13072 | elapsed time per iteration (ms): 13776.0 | learning rate: 3.626E-06 | global batch size: 16 | lm loss: 7.583301E+00 | loss scale: 8192.0 | grad norm: 46160.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 818/ 159576 | consumed samples: 13088 | elapsed time per iteration (ms): 14040.8 | learning rate: 3.630E-06 | global batch size: 16 | lm loss: 7.773385E+00 | loss scale: 8192.0 | grad norm: 42207.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 819/ 159576 | consumed samples: 13104 | elapsed time per iteration (ms): 13835.3 | learning rate: 3.635E-06 | global batch size: 16 | lm loss: 7.905573E+00 | loss scale: 8192.0 | grad norm: 111883.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 820/ 159576 | consumed samples: 13120 | elapsed time per iteration (ms): 13924.4 | learning rate: 3.639E-06 | global batch size: 16 | lm loss: 7.730550E+00 | loss scale: 8192.0 | grad norm: 75433.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 821/ 159576 | consumed samples: 13136 | elapsed time per iteration (ms): 13915.0 | learning rate: 3.643E-06 | global batch size: 16 | lm loss: 7.688564E+00 | loss scale: 8192.0 | grad norm: 41927.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 822/ 159576 | consumed samples: 13152 | elapsed time per iteration (ms): 13890.4 | learning rate: 3.648E-06 | global batch size: 16 | lm loss: 7.552343E+00 | loss scale: 8192.0 | grad norm: 96543.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 823/ 159576 | consumed samples: 13168 | elapsed time per iteration (ms): 13560.6 | learning rate: 3.652E-06 | global batch size: 16 | lm loss: 7.617982E+00 | loss scale: 8192.0 | grad norm: 56370.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 824/ 159576 | consumed samples: 13184 | elapsed time per iteration (ms): 14024.1 | learning rate: 3.657E-06 | global batch size: 16 | lm loss: 7.600199E+00 | loss scale: 8192.0 | grad norm: 61928.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 825/ 159576 | consumed samples: 13200 | elapsed time per iteration (ms): 14003.2 | learning rate: 3.661E-06 | global batch size: 16 | lm loss: 7.541789E+00 | loss scale: 8192.0 | grad norm: 56863.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 826/ 159576 | consumed samples: 13216 | elapsed time per iteration (ms): 13848.3 | learning rate: 3.666E-06 | global batch size: 16 | lm loss: 7.782004E+00 | loss scale: 8192.0 | grad norm: 59985.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 827/ 159576 | consumed samples: 13232 | elapsed time per iteration (ms): 13902.1 | learning rate: 3.670E-06 | global batch size: 16 | lm loss: 7.733065E+00 | loss scale: 8192.0 | grad norm: 39148.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 828/ 159576 | consumed samples: 13248 | elapsed time per iteration (ms): 14356.1 | learning rate: 3.675E-06 | global batch size: 16 | lm loss: 7.625387E+00 | loss scale: 8192.0 | grad norm: 56612.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 829/ 159576 | consumed samples: 13264 | elapsed time per iteration (ms): 14368.0 | learning rate: 3.679E-06 | global batch size: 16 | lm loss: 7.759684E+00 | loss scale: 8192.0 | grad norm: 67635.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 830/ 159576 | consumed samples: 13280 | elapsed time per iteration (ms): 13627.9 | learning rate: 3.683E-06 | global batch size: 16 | lm loss: 7.694915E+00 | loss scale: 8192.0 | grad norm: 60776.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 831/ 159576 | consumed samples: 13296 | elapsed time per iteration (ms): 13498.1 | learning rate: 3.688E-06 | global batch size: 16 | lm loss: 7.492978E+00 | loss scale: 8192.0 | grad norm: 42000.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 832/ 159576 | consumed samples: 13312 | elapsed time per iteration (ms): 13938.9 | learning rate: 3.692E-06 | global batch size: 16 | lm loss: 7.616700E+00 | loss scale: 8192.0 | grad norm: 105579.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 833/ 159576 | consumed samples: 13328 | elapsed time per iteration (ms): 13687.8 | learning rate: 3.697E-06 | global batch size: 16 | lm loss: 7.715961E+00 | loss scale: 8192.0 | grad norm: 78119.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 834/ 159576 | consumed samples: 13344 | elapsed time per iteration (ms): 13717.8 | learning rate: 3.701E-06 | global batch size: 16 | lm loss: 7.778497E+00 | loss scale: 8192.0 | grad norm: 58326.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 835/ 159576 | consumed samples: 13360 | elapsed time per iteration (ms): 13913.9 | learning rate: 3.706E-06 | global batch size: 16 | lm loss: 7.718093E+00 | loss scale: 8192.0 | grad norm: 48122.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 836/ 159576 | consumed samples: 13376 | elapsed time per iteration (ms): 14318.5 | learning rate: 3.710E-06 | global batch size: 16 | lm loss: 7.521303E+00 | loss scale: 8192.0 | grad norm: 60082.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 837/ 159576 | consumed samples: 13392 | elapsed time per iteration (ms): 13780.0 | learning rate: 3.714E-06 | global batch size: 16 | lm loss: 7.538383E+00 | loss scale: 8192.0 | grad norm: 61043.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 838/ 159576 | consumed samples: 13408 | elapsed time per iteration (ms): 13961.2 | learning rate: 3.719E-06 | global batch size: 16 | lm loss: 7.548276E+00 | loss scale: 8192.0 | grad norm: 58423.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 839/ 159576 | consumed samples: 13424 | elapsed time per iteration (ms): 14239.6 | learning rate: 3.723E-06 | global batch size: 16 | lm loss: 7.618182E+00 | loss scale: 8192.0 | grad norm: 48500.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 840/ 159576 | consumed samples: 13440 | elapsed time per iteration (ms): 13752.3 | learning rate: 3.728E-06 | global batch size: 16 | lm loss: 7.595082E+00 | loss scale: 8192.0 | grad norm: 50825.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 841/ 159576 | consumed samples: 13456 | elapsed time per iteration (ms): 14199.3 | learning rate: 3.732E-06 | global batch size: 16 | lm loss: 7.492725E+00 | loss scale: 8192.0 | grad norm: 56977.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 842/ 159576 | consumed samples: 13472 | elapsed time per iteration (ms): 13925.4 | learning rate: 3.737E-06 | global batch size: 16 | lm loss: 7.783816E+00 | loss scale: 8192.0 | grad norm: 40797.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 843/ 159576 | consumed samples: 13488 | elapsed time per iteration (ms): 14119.4 | learning rate: 3.741E-06 | global batch size: 16 | lm loss: 7.606951E+00 | loss scale: 8192.0 | grad norm: 50890.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 844/ 159576 | consumed samples: 13504 | elapsed time per iteration (ms): 13941.8 | learning rate: 3.746E-06 | global batch size: 16 | lm loss: 7.638199E+00 | loss scale: 8192.0 | grad norm: 52652.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 845/ 159576 | consumed samples: 13520 | elapsed time per iteration (ms): 14424.1 | learning rate: 3.750E-06 | global batch size: 16 | lm loss: 7.555171E+00 | loss scale: 8192.0 | grad norm: 48298.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 846/ 159576 | consumed samples: 13536 | elapsed time per iteration (ms): 14202.9 | learning rate: 3.754E-06 | global batch size: 16 | lm loss: 7.651504E+00 | loss scale: 8192.0 | grad norm: 76618.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 847/ 159576 | consumed samples: 13552 | elapsed time per iteration (ms): 13785.9 | learning rate: 3.759E-06 | global batch size: 16 | lm loss: 7.914087E+00 | loss scale: 8192.0 | grad norm: 40970.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 848/ 159576 | consumed samples: 13568 | elapsed time per iteration (ms): 13892.7 | learning rate: 3.763E-06 | global batch size: 16 | lm loss: 7.714731E+00 | loss scale: 8192.0 | grad norm: 47666.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 849/ 159576 | consumed samples: 13584 | elapsed time per iteration (ms): 13608.6 | learning rate: 3.768E-06 | global batch size: 16 | lm loss: 7.566309E+00 | loss scale: 8192.0 | grad norm: 56337.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 850/ 159576 | consumed samples: 13600 | elapsed time per iteration (ms): 13752.1 | learning rate: 3.772E-06 | global batch size: 16 | lm loss: 7.621016E+00 | loss scale: 8192.0 | grad norm: 55695.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 851/ 159576 | consumed samples: 13616 | elapsed time per iteration (ms): 13514.6 | learning rate: 3.777E-06 | global batch size: 16 | lm loss: 7.510153E+00 | loss scale: 8192.0 | grad norm: 70852.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 852/ 159576 | consumed samples: 13632 | elapsed time per iteration (ms): 13536.1 | learning rate: 3.781E-06 | global batch size: 16 | lm loss: 7.417966E+00 | loss scale: 8192.0 | grad norm: 43169.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 853/ 159576 | consumed samples: 13648 | elapsed time per iteration (ms): 14116.4 | learning rate: 3.786E-06 | global batch size: 16 | lm loss: 7.490001E+00 | loss scale: 8192.0 | grad norm: 61980.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 854/ 159576 | consumed samples: 13664 | elapsed time per iteration (ms): 14372.8 | learning rate: 3.790E-06 | global batch size: 16 | lm loss: 7.555287E+00 | loss scale: 8192.0 | grad norm: 43650.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 855/ 159576 | consumed samples: 13680 | elapsed time per iteration (ms): 13154.5 | learning rate: 3.794E-06 | global batch size: 16 | lm loss: 7.628311E+00 | loss scale: 8192.0 | grad norm: 32290.729 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 856/ 159576 | consumed samples: 13696 | elapsed time per iteration (ms): 13509.6 | learning rate: 3.799E-06 | global batch size: 16 | lm loss: 7.757495E+00 | loss scale: 8192.0 | grad norm: 94063.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 857/ 159576 | consumed samples: 13712 | elapsed time per iteration (ms): 14015.7 | learning rate: 3.803E-06 | global batch size: 16 | lm loss: 7.733263E+00 | loss scale: 8192.0 | grad norm: 53189.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 858/ 159576 | consumed samples: 13728 | elapsed time per iteration (ms): 14357.8 | learning rate: 3.808E-06 | global batch size: 16 | lm loss: 7.570580E+00 | loss scale: 8192.0 | grad norm: 57239.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 859/ 159576 | consumed samples: 13744 | elapsed time per iteration (ms): 13954.6 | learning rate: 3.812E-06 | global batch size: 16 | lm loss: 7.593122E+00 | loss scale: 8192.0 | grad norm: 45414.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 860/ 159576 | consumed samples: 13760 | elapsed time per iteration (ms): 14212.3 | learning rate: 3.817E-06 | global batch size: 16 | lm loss: 7.571471E+00 | loss scale: 8192.0 | grad norm: 75659.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 861/ 159576 | consumed samples: 13776 | elapsed time per iteration (ms): 14044.0 | learning rate: 3.821E-06 | global batch size: 16 | lm loss: 7.599829E+00 | loss scale: 8192.0 | grad norm: 47651.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 862/ 159576 | consumed samples: 13792 | elapsed time per iteration (ms): 13529.5 | learning rate: 3.825E-06 | global batch size: 16 | lm loss: 7.427186E+00 | loss scale: 8192.0 | grad norm: 76377.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 863/ 159576 | consumed samples: 13808 | elapsed time per iteration (ms): 14057.3 | learning rate: 3.830E-06 | global batch size: 16 | lm loss: 7.736305E+00 | loss scale: 8192.0 | grad norm: 76320.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 864/ 159576 | consumed samples: 13824 | elapsed time per iteration (ms): 14064.2 | learning rate: 3.834E-06 | global batch size: 16 | lm loss: 7.637553E+00 | loss scale: 8192.0 | grad norm: 56695.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 865/ 159576 | consumed samples: 13840 | elapsed time per iteration (ms): 14009.0 | learning rate: 3.839E-06 | global batch size: 16 | lm loss: 7.709378E+00 | loss scale: 8192.0 | grad norm: 77647.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 866/ 159576 | consumed samples: 13856 | elapsed time per iteration (ms): 13951.3 | learning rate: 3.843E-06 | global batch size: 16 | lm loss: 7.856131E+00 | loss scale: 8192.0 | grad norm: 85925.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 867/ 159576 | consumed samples: 13872 | elapsed time per iteration (ms): 14427.4 | learning rate: 3.848E-06 | global batch size: 16 | lm loss: 7.511599E+00 | loss scale: 8192.0 | grad norm: 50353.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 868/ 159576 | consumed samples: 13888 | elapsed time per iteration (ms): 14117.9 | learning rate: 3.852E-06 | global batch size: 16 | lm loss: 7.803133E+00 | loss scale: 8192.0 | grad norm: 73334.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 869/ 159576 | consumed samples: 13904 | elapsed time per iteration (ms): 13519.9 | learning rate: 3.857E-06 | global batch size: 16 | lm loss: 7.515793E+00 | loss scale: 8192.0 | grad norm: 73466.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 870/ 159576 | consumed samples: 13920 | elapsed time per iteration (ms): 13901.3 | learning rate: 3.861E-06 | global batch size: 16 | lm loss: 7.841221E+00 | loss scale: 8192.0 | grad norm: 74455.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 871/ 159576 | consumed samples: 13936 | elapsed time per iteration (ms): 14383.8 | learning rate: 3.865E-06 | global batch size: 16 | lm loss: 7.850037E+00 | loss scale: 8192.0 | grad norm: 49579.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 872/ 159576 | consumed samples: 13952 | elapsed time per iteration (ms): 14031.3 | learning rate: 3.870E-06 | global batch size: 16 | lm loss: 7.490081E+00 | loss scale: 8192.0 | grad norm: 71074.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 873/ 159576 | consumed samples: 13968 | elapsed time per iteration (ms): 13971.5 | learning rate: 3.874E-06 | global batch size: 16 | lm loss: 7.783985E+00 | loss scale: 8192.0 | grad norm: 102193.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 874/ 159576 | consumed samples: 13984 | elapsed time per iteration (ms): 14176.3 | learning rate: 3.879E-06 | global batch size: 16 | lm loss: 7.557288E+00 | loss scale: 8192.0 | grad norm: 71546.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 875/ 159576 | consumed samples: 14000 | elapsed time per iteration (ms): 14495.9 | learning rate: 3.883E-06 | global batch size: 16 | lm loss: 7.703010E+00 | loss scale: 8192.0 | grad norm: 50279.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 876/ 159576 | consumed samples: 14016 | elapsed time per iteration (ms): 13722.6 | learning rate: 3.888E-06 | global batch size: 16 | lm loss: 7.542592E+00 | loss scale: 8192.0 | grad norm: 44841.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 877/ 159576 | consumed samples: 14032 | elapsed time per iteration (ms): 13946.5 | learning rate: 3.892E-06 | global batch size: 16 | lm loss: 7.776785E+00 | loss scale: 8192.0 | grad norm: 109756.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 878/ 159576 | consumed samples: 14048 | elapsed time per iteration (ms): 13948.7 | learning rate: 3.896E-06 | global batch size: 16 | lm loss: 7.728590E+00 | loss scale: 8192.0 | grad norm: 70820.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 879/ 159576 | consumed samples: 14064 | elapsed time per iteration (ms): 13882.9 | learning rate: 3.901E-06 | global batch size: 16 | lm loss: 7.672616E+00 | loss scale: 8192.0 | grad norm: 44570.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 880/ 159576 | consumed samples: 14080 | elapsed time per iteration (ms): 14042.4 | learning rate: 3.905E-06 | global batch size: 16 | lm loss: 7.680589E+00 | loss scale: 8192.0 | grad norm: 124008.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 881/ 159576 | consumed samples: 14096 | elapsed time per iteration (ms): 13930.7 | learning rate: 3.910E-06 | global batch size: 16 | lm loss: 7.501089E+00 | loss scale: 8192.0 | grad norm: 46056.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 882/ 159576 | consumed samples: 14112 | elapsed time per iteration (ms): 14239.7 | learning rate: 3.914E-06 | global batch size: 16 | lm loss: 7.571886E+00 | loss scale: 8192.0 | grad norm: 66612.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 883/ 159576 | consumed samples: 14128 | elapsed time per iteration (ms): 13486.8 | learning rate: 3.919E-06 | global batch size: 16 | lm loss: 7.536567E+00 | loss scale: 8192.0 | grad norm: 62829.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 884/ 159576 | consumed samples: 14144 | elapsed time per iteration (ms): 14209.0 | learning rate: 3.923E-06 | global batch size: 16 | lm loss: 7.794725E+00 | loss scale: 8192.0 | grad norm: 67729.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 885/ 159576 | consumed samples: 14160 | elapsed time per iteration (ms): 13720.4 | learning rate: 3.928E-06 | global batch size: 16 | lm loss: 7.468060E+00 | loss scale: 8192.0 | grad norm: 44457.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 886/ 159576 | consumed samples: 14176 | elapsed time per iteration (ms): 13867.7 | learning rate: 3.932E-06 | global batch size: 16 | lm loss: 7.478938E+00 | loss scale: 8192.0 | grad norm: 45629.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 887/ 159576 | consumed samples: 14192 | elapsed time per iteration (ms): 13805.2 | learning rate: 3.936E-06 | global batch size: 16 | lm loss: 7.427522E+00 | loss scale: 8192.0 | grad norm: 59355.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 888/ 159576 | consumed samples: 14208 | elapsed time per iteration (ms): 14520.3 | learning rate: 3.941E-06 | global batch size: 16 | lm loss: 7.602240E+00 | loss scale: 8192.0 | grad norm: 45450.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 889/ 159576 | consumed samples: 14224 | elapsed time per iteration (ms): 13870.2 | learning rate: 3.945E-06 | global batch size: 16 | lm loss: 7.682034E+00 | loss scale: 8192.0 | grad norm: 51153.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 890/ 159576 | consumed samples: 14240 | elapsed time per iteration (ms): 13708.4 | learning rate: 3.950E-06 | global batch size: 16 | lm loss: 7.558862E+00 | loss scale: 8192.0 | grad norm: 46389.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 891/ 159576 | consumed samples: 14256 | elapsed time per iteration (ms): 13645.4 | learning rate: 3.954E-06 | global batch size: 16 | lm loss: 7.527663E+00 | loss scale: 8192.0 | grad norm: 86582.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 892/ 159576 | consumed samples: 14272 | elapsed time per iteration (ms): 13652.2 | learning rate: 3.959E-06 | global batch size: 16 | lm loss: 7.675562E+00 | loss scale: 8192.0 | grad norm: 68924.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 893/ 159576 | consumed samples: 14288 | elapsed time per iteration (ms): 14020.9 | learning rate: 3.963E-06 | global batch size: 16 | lm loss: 7.534761E+00 | loss scale: 8192.0 | grad norm: 47359.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 894/ 159576 | consumed samples: 14304 | elapsed time per iteration (ms): 13841.4 | learning rate: 3.967E-06 | global batch size: 16 | lm loss: 7.447322E+00 | loss scale: 8192.0 | grad norm: 51692.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 895/ 159576 | consumed samples: 14320 | elapsed time per iteration (ms): 14037.6 | learning rate: 3.972E-06 | global batch size: 16 | lm loss: 7.507210E+00 | loss scale: 8192.0 | grad norm: 64045.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 896/ 159576 | consumed samples: 14336 | elapsed time per iteration (ms): 14109.9 | learning rate: 3.976E-06 | global batch size: 16 | lm loss: 7.523023E+00 | loss scale: 8192.0 | grad norm: 62130.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 897/ 159576 | consumed samples: 14352 | elapsed time per iteration (ms): 14567.0 | learning rate: 3.981E-06 | global batch size: 16 | lm loss: 7.609581E+00 | loss scale: 8192.0 | grad norm: 45111.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 898/ 159576 | consumed samples: 14368 | elapsed time per iteration (ms): 13613.4 | learning rate: 3.985E-06 | global batch size: 16 | lm loss: 7.677504E+00 | loss scale: 8192.0 | grad norm: 77037.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 899/ 159576 | consumed samples: 14384 | elapsed time per iteration (ms): 13889.7 | learning rate: 3.990E-06 | global batch size: 16 | lm loss: 7.463535E+00 | loss scale: 8192.0 | grad norm: 63218.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 900/ 159576 | consumed samples: 14400 | elapsed time per iteration (ms): 13953.1 | learning rate: 3.994E-06 | global batch size: 16 | lm loss: 7.512316E+00 | loss scale: 8192.0 | grad norm: 45889.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 901/ 159576 | consumed samples: 14416 | elapsed time per iteration (ms): 14162.8 | learning rate: 3.999E-06 | global batch size: 16 | lm loss: 7.882708E+00 | loss scale: 8192.0 | grad norm: 42823.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 902/ 159576 | consumed samples: 14432 | elapsed time per iteration (ms): 13923.6 | learning rate: 4.003E-06 | global batch size: 16 | lm loss: 7.662213E+00 | loss scale: 8192.0 | grad norm: 61513.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 903/ 159576 | consumed samples: 14448 | elapsed time per iteration (ms): 14309.5 | learning rate: 4.007E-06 | global batch size: 16 | lm loss: 7.560106E+00 | loss scale: 8192.0 | grad norm: 69145.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 904/ 159576 | consumed samples: 14464 | elapsed time per iteration (ms): 13872.6 | learning rate: 4.012E-06 | global batch size: 16 | lm loss: 7.580536E+00 | loss scale: 8192.0 | grad norm: 50555.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 905/ 159576 | consumed samples: 14480 | elapsed time per iteration (ms): 13660.1 | learning rate: 4.016E-06 | global batch size: 16 | lm loss: 7.370582E+00 | loss scale: 8192.0 | grad norm: 58747.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 906/ 159576 | consumed samples: 14496 | elapsed time per iteration (ms): 14302.6 | learning rate: 4.021E-06 | global batch size: 16 | lm loss: 7.578561E+00 | loss scale: 8192.0 | grad norm: 51271.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 907/ 159576 | consumed samples: 14512 | elapsed time per iteration (ms): 13761.7 | learning rate: 4.025E-06 | global batch size: 16 | lm loss: 7.886317E+00 | loss scale: 8192.0 | grad norm: 103662.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 908/ 159576 | consumed samples: 14528 | elapsed time per iteration (ms): 13804.9 | learning rate: 4.030E-06 | global batch size: 16 | lm loss: 7.671743E+00 | loss scale: 8192.0 | grad norm: 73682.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 909/ 159576 | consumed samples: 14544 | elapsed time per iteration (ms): 13551.5 | learning rate: 4.034E-06 | global batch size: 16 | lm loss: 7.644366E+00 | loss scale: 8192.0 | grad norm: 44749.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 910/ 159576 | consumed samples: 14560 | elapsed time per iteration (ms): 14145.8 | learning rate: 4.038E-06 | global batch size: 16 | lm loss: 7.575992E+00 | loss scale: 8192.0 | grad norm: 123440.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 911/ 159576 | consumed samples: 14576 | elapsed time per iteration (ms): 13697.4 | learning rate: 4.043E-06 | global batch size: 16 | lm loss: 7.622074E+00 | loss scale: 8192.0 | grad norm: 106507.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 912/ 159576 | consumed samples: 14592 | elapsed time per iteration (ms): 13234.0 | learning rate: 4.047E-06 | global batch size: 16 | lm loss: 7.362756E+00 | loss scale: 8192.0 | grad norm: 47407.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 913/ 159576 | consumed samples: 14608 | elapsed time per iteration (ms): 13588.2 | learning rate: 4.052E-06 | global batch size: 16 | lm loss: 7.463619E+00 | loss scale: 8192.0 | grad norm: 52603.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 914/ 159576 | consumed samples: 14624 | elapsed time per iteration (ms): 13866.4 | learning rate: 4.056E-06 | global batch size: 16 | lm loss: 7.559254E+00 | loss scale: 8192.0 | grad norm: 75070.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 915/ 159576 | consumed samples: 14640 | elapsed time per iteration (ms): 13445.5 | learning rate: 4.061E-06 | global batch size: 16 | lm loss: 7.466935E+00 | loss scale: 8192.0 | grad norm: 84703.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 916/ 159576 | consumed samples: 14656 | elapsed time per iteration (ms): 13592.3 | learning rate: 4.065E-06 | global batch size: 16 | lm loss: 7.530110E+00 | loss scale: 8192.0 | grad norm: 68897.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 917/ 159576 | consumed samples: 14672 | elapsed time per iteration (ms): 13623.0 | learning rate: 4.070E-06 | global batch size: 16 | lm loss: 7.709665E+00 | loss scale: 8192.0 | grad norm: 42674.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 918/ 159576 | consumed samples: 14688 | elapsed time per iteration (ms): 13933.4 | learning rate: 4.074E-06 | global batch size: 16 | lm loss: 7.340624E+00 | loss scale: 8192.0 | grad norm: 62308.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 919/ 159576 | consumed samples: 14704 | elapsed time per iteration (ms): 13383.8 | learning rate: 4.078E-06 | global batch size: 16 | lm loss: 7.633225E+00 | loss scale: 8192.0 | grad norm: 101681.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 920/ 159576 | consumed samples: 14720 | elapsed time per iteration (ms): 13577.7 | learning rate: 4.083E-06 | global batch size: 16 | lm loss: 7.753546E+00 | loss scale: 8192.0 | grad norm: 64758.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 921/ 159576 | consumed samples: 14736 | elapsed time per iteration (ms): 13615.2 | learning rate: 4.087E-06 | global batch size: 16 | lm loss: 7.587958E+00 | loss scale: 8192.0 | grad norm: 50894.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 922/ 159576 | consumed samples: 14752 | elapsed time per iteration (ms): 13349.8 | learning rate: 4.092E-06 | global batch size: 16 | lm loss: 7.769899E+00 | loss scale: 8192.0 | grad norm: 142837.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 923/ 159576 | consumed samples: 14768 | elapsed time per iteration (ms): 13909.6 | learning rate: 4.096E-06 | global batch size: 16 | lm loss: 7.624977E+00 | loss scale: 8192.0 | grad norm: 83848.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 924/ 159576 | consumed samples: 14784 | elapsed time per iteration (ms): 13544.9 | learning rate: 4.101E-06 | global batch size: 16 | lm loss: 7.603238E+00 | loss scale: 8192.0 | grad norm: 56820.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 925/ 159576 | consumed samples: 14800 | elapsed time per iteration (ms): 14229.7 | learning rate: 4.105E-06 | global batch size: 16 | lm loss: 7.706733E+00 | loss scale: 8192.0 | grad norm: 76791.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 926/ 159576 | consumed samples: 14816 | elapsed time per iteration (ms): 13216.1 | learning rate: 4.109E-06 | global batch size: 16 | lm loss: 7.619715E+00 | loss scale: 8192.0 | grad norm: 71541.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 927/ 159576 | consumed samples: 14832 | elapsed time per iteration (ms): 13878.1 | learning rate: 4.114E-06 | global batch size: 16 | lm loss: 7.712871E+00 | loss scale: 8192.0 | grad norm: 73909.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 928/ 159576 | consumed samples: 14848 | elapsed time per iteration (ms): 13952.8 | learning rate: 4.118E-06 | global batch size: 16 | lm loss: 7.413386E+00 | loss scale: 8192.0 | grad norm: 57651.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 929/ 159576 | consumed samples: 14864 | elapsed time per iteration (ms): 13472.5 | learning rate: 4.123E-06 | global batch size: 16 | lm loss: 7.559020E+00 | loss scale: 8192.0 | grad norm: 91128.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 930/ 159576 | consumed samples: 14880 | elapsed time per iteration (ms): 13393.9 | learning rate: 4.127E-06 | global batch size: 16 | lm loss: 7.636448E+00 | loss scale: 8192.0 | grad norm: 48957.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 931/ 159576 | consumed samples: 14896 | elapsed time per iteration (ms): 13547.0 | learning rate: 4.132E-06 | global batch size: 16 | lm loss: 7.639730E+00 | loss scale: 8192.0 | grad norm: 110788.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 932/ 159576 | consumed samples: 14912 | elapsed time per iteration (ms): 14018.3 | learning rate: 4.136E-06 | global batch size: 16 | lm loss: 7.652531E+00 | loss scale: 8192.0 | grad norm: 96359.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 933/ 159576 | consumed samples: 14928 | elapsed time per iteration (ms): 13449.4 | learning rate: 4.141E-06 | global batch size: 16 | lm loss: 7.671719E+00 | loss scale: 8192.0 | grad norm: 60936.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 934/ 159576 | consumed samples: 14944 | elapsed time per iteration (ms): 13624.9 | learning rate: 4.145E-06 | global batch size: 16 | lm loss: 7.672961E+00 | loss scale: 8192.0 | grad norm: 45848.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 935/ 159576 | consumed samples: 14960 | elapsed time per iteration (ms): 13787.5 | learning rate: 4.149E-06 | global batch size: 16 | lm loss: 7.740889E+00 | loss scale: 8192.0 | grad norm: 140359.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 936/ 159576 | consumed samples: 14976 | elapsed time per iteration (ms): 13643.3 | learning rate: 4.154E-06 | global batch size: 16 | lm loss: 7.595088E+00 | loss scale: 8192.0 | grad norm: 125926.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 937/ 159576 | consumed samples: 14992 | elapsed time per iteration (ms): 13588.2 | learning rate: 4.158E-06 | global batch size: 16 | lm loss: 7.580822E+00 | loss scale: 8192.0 | grad norm: 88915.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 938/ 159576 | consumed samples: 15008 | elapsed time per iteration (ms): 13606.3 | learning rate: 4.163E-06 | global batch size: 16 | lm loss: 7.766950E+00 | loss scale: 8192.0 | grad norm: 88671.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 939/ 159576 | consumed samples: 15024 | elapsed time per iteration (ms): 13894.4 | learning rate: 4.167E-06 | global batch size: 16 | lm loss: 7.578055E+00 | loss scale: 8192.0 | grad norm: 66434.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 940/ 159576 | consumed samples: 15040 | elapsed time per iteration (ms): 13885.0 | learning rate: 4.172E-06 | global batch size: 16 | lm loss: 7.837738E+00 | loss scale: 8192.0 | grad norm: 64490.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 941/ 159576 | consumed samples: 15056 | elapsed time per iteration (ms): 14127.9 | learning rate: 4.176E-06 | global batch size: 16 | lm loss: 7.961911E+00 | loss scale: 8192.0 | grad norm: 155493.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 942/ 159576 | consumed samples: 15072 | elapsed time per iteration (ms): 14120.5 | learning rate: 4.180E-06 | global batch size: 16 | lm loss: 7.581886E+00 | loss scale: 8192.0 | grad norm: 84829.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 942 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-24 05:51:49,558] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step942/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 942 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17459.68 [exiting program after 110.12040019432703 minutes] datetime: 2021-09-24 05:52:01 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO]ninja ......................... [OKAY][OKAY] --------------------------------------------------fused_lamb .............op name [NO]................ .......installed [OKAY].. compatible -------------------------------------------------- cpu_adamsparse_attn ........................... [YES][NO] ...... .......[OKAY] [OKAY] transformer ............ [NO] .......fused_adam [OKAY]............. [NO] ....... stochastic_transformer[OKAY] . fused_lamb[NO] .................... [NO][OKAY] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] --------------------------------------------------transformer ............ DeepSpeed C++/CUDA extension op report[NO] .......-------------------------------------------------- [OKAY]NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjastochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc version .....................torch install path 11.2 ...............deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ................... torch version0.4.2+bc17042, bc17042, big-science ....................deepspeed wheel compiled w. 1.8.1...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO] [NO]....... ....... [OKAY][OKAY] fused_lamb .............fused_lamb [NO]............. .......[NO] [OKAY]....... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_adam op name............. ................[NO] installed....... ..[OKAY] compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] .......fused_adam [OKAY]............. [NO] .......transformer [OKAY]............ [NO] .......fused_lamb [OKAY]............. [NO] ....... [OKAY]stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] --------------------------------------------------sparse_attn ............ [NO] DeepSpeed C++/CUDA extension op report....... [OKAY]-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.transformer ............-------------------------------------------------- [NO]JIT compiled ops requires ninja ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam-------------------------------------------------- ............... op name[YES] ................ ......installed ..[OKAY] compatible -------------------------------------------------- fused_adamcpu_adam ............................ [YES][NO] ...... .......[OKAY] [OKAY] fused_lamb ............. [NO] ....... fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_lamb ............. [NO] sparse_attn....... ............[OKAY] [NO] ....... [OKAY] transformer ............ [NO] .......sparse_attn [OKAY]............ [NO] ....... stochastic_transformer[OKAY] .transformer [NO]............ .......[NO] [OKAY]....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ............... nvcc version11.1 ..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.cpu_adam --------------------------------------------------............... JIT compiled ops requires ninja[YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. torch install path...... torch 1.8, cuda 11.1............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 fused_adam ............. [NO] ....... [OKAY] torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science fused_lamb ............. [NO] ....... [OKAY] deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed install path................... ........... 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... torch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] op name ................ installed .. compatible deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]-------------------------------------------------- fused_lamb DeepSpeed C++/CUDA extension op report............. [NO]-------------------------------------------------- .......NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. [OKAY] -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc versiontorch install path ..................... ...............11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info ...................torch version ....................0.4.2+bc17042, bc17042, big-science 1.8.1deepspeed wheel compiled w. ......torch cuda version torch 1.8, cuda 11.1............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] .............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system stochastic_transformer . [NO] ....... [OKAY] meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninja .................................... [OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------op name op name................ installed................ .. installedcompatible ..-------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... [OKAY]............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. fused_lamb[NO] .................... [NO] [OKAY]....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn transformer............ ............[NO] [NO] .............. [OKAY][OKAY] transformerstochastic_transformer ............ .[NO] [NO]....... ....... [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 stochastic_transformer . [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ...............-------------------------------------------------- [YES]op name ...................... [OKAY]installed .. compatible -------------------------------------------------- fused_adam ............. cpu_adam[NO] ............... .......[YES] [OKAY]...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attnfused_lamb ......................... [NO][NO] .............. [OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY]transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system transformer ............ [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch version .................... 1.8.1 stochastic_transformer . [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch version .................... 1.8.1 -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 ninja .................. [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system fused_lamb ............. [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] DeepSpeed general environment info: -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- torch install path DeepSpeed general environment info:............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path cpu_adam ............... [YES] ...... [OKAY] torch install path...............torch version ................................... 1.8.1 torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... fused_adam ............. [NO] ....... [OKAY] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 torch version nvcc versiontorch version.................... .........................................1.8.1 11.21.8.1 torch cuda versiondeepspeed install path torch cuda version.......................... ...............11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1nvcc version fused_lamb ............. [NO] ....... [OKAY] deepspeed infonvcc version..................... ........................................11.2 0.4.2+bc17042, bc17042, big-science11.2deepspeed install path deepspeed wheel compiled w.deepspeed install path........... ................. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. sparse_attn ............ [NO] ....... [OKAY] ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: ninja .................. [OKAY] -------------------------------------------------- torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 op name ................ installed .. compatible torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 -------------------------------------------------- deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science cpu_adam ............... [YES] ...... [OKAY] deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info: transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]ninja .................. fused_lamb[OKAY] ............. --------------------------------------------------[NO] .......op name [OKAY]................ installed .. compatible -------------------------------------------------- sparse_attn ............cpu_adam [NO]............... .......[YES] [OKAY]...... [OKAY]transformer ............ [NO] ....... [OKAY] stochastic_transformerfused_adam .............. [NO][NO] .............. [OKAY][OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] torch version .................... 1.8.1 transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 utilsutils .................................... [YES] [YES]...... ......[OKAY] [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninja .................................... [OKAY][OKAY] ninja---------------------------------------------------------------------------------------------------- ..................op nameop name [OKAY]................ ................installed-------------------------------------------------- installed..op name compatible.................. compatibleinstalled-------------------------------------------------- .. --------------------------------------------------compatible -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adam............... [OKAY]...............[YES] ......[YES] [OKAY]...... [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adam .......................... [NO]fused_lamb[NO] ........................... [OKAY][OKAY][NO] .......fused_lamb fused_lamb [OKAY] ............. ............. [NO][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] .......ninja [OKAY] sparse_attn..................sparse_attn transformer ............ [OKAY]............ ............ [NO] [NO] --------------------------------------------------[NO].............. op name[OKAY].......[OKAY] ................[OKAY] transformerinstalledtransformer .. ............ ............ stochastic_transformer[NO]compatible [NO]--------------------------------------------------....... [OKAY]........ [NO][OKAY] .......cpu_adamstochastic_transformer stochastic_transformer...............[OKAY] . [YES].[NO] ...... [NO].......[OKAY] ....... [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ...............utils [NO].................. .......[YES] [NO]...... [OKAY] quantizer .............. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY]-------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] async_io....... [NO] -------------------------------------------------- ............... [NO] ....... [NO]transformer_inference .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. utils[NO] ......................... [YES][OKAY] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO]async_io DeepSpeed general environment info:torch install path ............... [NO] ....... [NO] ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch versiontorch cuda version ................................... 1.8.111.1 nvcc version torch cuda version..................... ............... 11.211.1 transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer utils.............. ..................[NO] [YES]....... [OKAY]...... deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] transformer_inferenceasync_io .. ...............[NO] [NO]....... .......[OKAY] [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] --------------------------------------------------utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ....... ............... [NO] ....... [NO] [NO] transformer_inferencetransformer_inference .... [NO] ....... [OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed C++/CUDA extension op report deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] ....... .......[OKAY] stochastic_transformer . [NO] ....... [OKAY] [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]ninja utils .................. [YES] ...... [OKAY] transformer.................. ............ [NO][OKAY] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name stochastic_transformer................ installed. ..[NO] .......compatible [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. [NO] ....... utils[OKAY] async_io ...............async_io [NO] ...................... [NO][NO] .................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] ....... [NO] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam-------------------------------------------------- ............. op name[NO] ................ .......ninjainstalled ..[OKAY].................. compatible[OKAY] fused_lamb -------------------------------------------------- .............-------------------------------------------------- [NO] op name....... ................[OKAY] cpu_adaminstalled ................. [YES]compatible ...... --------------------------------------------------[OKAY] sparse_attn ............ [NO] ....... [OKAY] transformercpu_adam fused_adam ............ ............... ............. [NO] [YES] [NO] ....... ...... ....... [OKAY] [OKAY] [OKAY] fused_lambstochastic_transformer ............. [NO] ........ [NO]fused_adam[OKAY] .................... [NO][OKAY] ....... [OKAY] DeepSpeed general environment info: fused_lamb ............. sparse_attn[NO] ............ .......[NO] .......[OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] sparse_attnstochastic_transformer ............ .[NO] [NO]....... .......[OKAY] [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 transformer ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] stochastic_transformer . [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version .................... 1.8.1 .................... 1.8.1torch cuda version torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch cuda version11.1 ............... nvcc version11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 ..................... nvcc version11.2 ..................... deepspeed install path11.2 nvcc version ..................... 11.2 ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... 0.4.2+bc17042, bc17042, big-science deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] .......transformer_inference [NO].. [NO] ....... [OKAY] transformer_inferenceutils .................... [YES][NO] ............. [OKAY][OKAY] quantizer ..............utils [NO].................. .......[YES] [OKAY]...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- DeepSpeed C++/CUDA extension op report deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1torch install path torch cuda version .............................. 11.1 nvcc version ..................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.2 deepspeed install path torch version........... .................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']1.8.1 deepspeed info torch cuda version................... ...............0.4.2+bc17042, bc17042, big-science 11.1deepspeed wheel compiled w. nvcc version...... .....................torch 1.8, cuda 11.1 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1torch cuda version nvcc version............... .....................11.1 11.2 nvcc version deepspeed install path..................... ...........11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ninja .................. [OKAY] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version op name ................ installed .. compatible torch version .................... 1.8.1 .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 stochastic_transformer . [NO] ....... [OKAY] torch cuda versiontorch install path ............... ...............11.1 nvcc version ..................... 11.2 cpu_adam ............... [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed install path ........... torch version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] .................... 1.8.1 torch cuda version ............... 11.1deepspeed info fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] nvcc version................... ..................... 11.2 deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science...... transformer ............ [NO] ....... [OKAY] deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninjacpu_adam ................................. [OKAY][YES] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: ......-------------------------------------------------- [OKAY]op name torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ................ installed .. compatible torch cuda version ............... 11.1 -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 cpu_adam fused_lamb............... [YES]............. ...... [NO][OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. sparse_attn[NO] ................... [OKAY][NO] ....... [OKAY]fused_lamb ............. [NO]transformer ....... ............[OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] sparse_attn....... ............[OKAY] [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science utils .................. [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install pathtorch cuda version .............................. 11.1 nvcc version ..................... 11.2['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install path ...........torch version ....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 1.8.1deepspeed info ...................torch cuda version 0.4.2+bc17042, bc17042, big-science............... deepspeed wheel compiled w.11.1 ......nvcc version torch 1.8, cuda 11.1..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. JIT compiled ops requires ninja async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. [NO] ....... [OKAY] .................. [YES] ...... utils[OKAY] .................. [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY]quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY][YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES]ninja ...... ..................[OKAY] [OKAY] -------------------------------------------------- op name ................ installed ..fused_adam compatible............. --------------------------------------------------[NO] ....... [OKAY] cpu_adamfused_lamb ............... .............[YES] [NO]...... .......[OKAY] [OKAY] fused_adamsparse_attn ......................... [NO] [NO]....... .......[OKAY] [OKAY] fused_lambtransformer ......................... [NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... sparse_attn ............ [OKAY][NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer .............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ......quantizer [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] ninja-------------------------------------------------- .................. op name[OKAY] ................ installed-------------------------------------------------- .. op namecompatible ................-------------------------------------------------- installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY]............. [NO] ....... fused_lamb[OKAY] ............. [NO] fused_lamb....... .............[OKAY] [NO] ....... [OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] .......transformer [OKAY]............ [NO] ....... transformer[OKAY] ............ [NO] ....... [OKAY]stochastic_transformer .stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY][0m -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 async_io ............... [NO] ....... [NO] nvcc versionnvcc version .......................................... 11.211.2 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science quantizer .............. [NO] ....... [OKAY] deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 using torch.float16 for parameters ... ------------------------ arguments ------------------------ **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] accumulate_allreduce_grads_in_fp32 .............. False transformer ............ [NO] ....... [OKAY] adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True stochastic_transformer . [NO] ....... [OKAY] apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1164492.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info:DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch versiontorch version ........................................ 1.8.11.8.1 utils .................. [YES] ...... [OKAY] torch cuda versiontorch cuda version .............................. 11.111.1 quantizer .............. [NO] ....... [OKAY] nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. /bin/sh: line 0: type: git: not found [NO] ....... [OKAY]utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path torch version............... .................... 1.8.1 torch cuda version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... 11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY][YES] ...... [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. ...... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:DeepSpeed general environment info: torch version .................... 1.8.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ............... 11.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja ninja.................. [OKAY].................. [OKAY]-------------------------------------------------- --------------------------------------------------op name ................op name installed................ .. installedcompatible .. --------------------------------------------------compatible -------------------------------------------------- cpu_adam ...............cpu_adam [YES] ..................... [YES][OKAY] ...... [OKAY] ninja .................. [OKAY] fused_adam-------------------------------------------------- fused_adam.............op name [NO]............................. installed.......[NO] [OKAY]......... compatible[OKAY] fused_lamb --------------------------------------------------............. [NO]fused_lamb .................... [OKAY][NO] cpu_adam.......ninja ............... [OKAY]..................[YES] [OKAY]......sparse_attn [OKAY] ............ --------------------------------------------------[NO] .......op name sparse_attn[OKAY]................ fused_adam installed......................... ..transformer[NO][NO] .......................... [OKAY][NO][OKAY]compatible .......transformer --------------------------------------------------[OKAY]fused_lamb............ .............[NO] stochastic_transformer[NO]....... .......[OKAY] . [OKAY] [NO]cpu_adamstochastic_transformer ...................... . [OKAY] sparse_attn[YES] [NO] .................. [NO][OKAY]....... [OKAY] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adamstochastic_transformer .............. [NO][NO] .............. [OKAY] [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ...............DeepSpeed general environment info: 11.1 nvcc version ..................... 11.2 deepspeed install pathtorch install path ........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. ...... torch versiontorch 1.8, cuda 11.1 .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja ..................ninja [OKAY] ..................-------------------------------------------------- [OKAY]op name ................-------------------------------------------------- installed ..op name compatible ................-------------------------------------------------- installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adam[OKAY] ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY]fused_adam ............. [NO]fused_lamb .................... [NO] [OKAY]....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] fused_lamb ............. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 sparse_attn ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 sparse_attn ............transformer [NO]............ .......[NO] .......[OKAY] [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer ............stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science stochastic_transformer . [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 ..................... nvcc version11.2 ..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] ninja-------------------------------------------------- ..................op name [OKAY]................ --------------------------------------------------installed ..op name compatible................ installed-------------------------------------------------- .. compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] [YES] ...... [OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY][NO] ....... fused_lamb[OKAY] ............. [NO]fused_lamb .................... [OKAY][NO] ....... [OKAY] sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO] ....... [OKAY]transformer ............ transformer[NO] ................... [NO][OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer . [NO]. [NO]....... ....... [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninjatransformer .............................. [OKAY][NO] --------------------------------------------------....... [OKAY] op name ................ installed stochastic_transformer.. compatible. --------------------------------------------------[NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.DeepSpeed general environment info: ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ...............torch cuda version 11.1............... 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ...... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch install path 1.8.1 ...............torch cuda version ............... 11.1 nvcc version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']..................... 11.2 torch versiondeepspeed install path ............................... 1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda versiondeepspeed info .................................. 11.10.4.2+bc17042, bc17042, big-science nvcc versiondeepspeed wheel compiled w. ........................... 11.2torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... DeepSpeed general environment info:1.8.1 torch cuda version ............... 11.1 torch install pathnvcc version ..................... ...............11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ................... torch version0.4.2+bc17042, bc17042, big-science ....................deepspeed wheel compiled w. 1.8.1...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found > setting codecarbon ... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... ninja[YES] ...... ..................[OKAY] [OKAY] -------------------------------------------------- op name ................ installed .. fused_adamcompatible ............. --------------------------------------------------[NO] ....... [OKAY] fused_lamb ............. cpu_adam[NO] ...................... [OKAY][YES] ...... [OKAY] sparse_attnfused_adam ......................... [NO][NO] ....... .......[OKAY] [OKAY] transformer ............fused_lamb [NO]............. ....... [NO][OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .. [NO] ......... [NO] [OKAY]....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- > initializing torch distributed ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils ..................transformer_inference [YES].. ......[NO] [OKAY]....... [OKAY] quantizer .............. [NO] utils....... [OKAY].................. [YES] ...... --------------------------------------------------[OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ....... ...............[NO] [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op nameninja ................ ..................installed [OKAY].. compatible -------------------------------------------------- -------------------------------------------------- ninjaop name .................................. installed[OKAY]cpu_adam ................. -------------------------------------------------- ninjacompatible[YES]op name .................. ................ -------------------------------------------------- ......[OKAY]installed --------------------------------------------------[OKAY].. op namecompatible cpu_adam ................ -------------------------------------------------- ...............installed fused_adam..[YES] compatible................... cpu_adam -------------------------------------------------- [OKAY][NO] ............... [YES]....... ...... [OKAY]cpu_adam[OKAY] fused_adam............... [YES]fused_lamb ............. ................... [OKAY][NO][NO] fused_adam ........................... [OKAY][OKAY][NO] fused_adam....... .............fused_lamb[OKAY] [NO] ............. ....... [NO][OKAY]fused_lamb sparse_attn fused_lamb............. ....... ............[NO] ............. [NO] [OKAY].......[NO] ....... .......[OKAY][OKAY] [OKAY] transformersparse_attn ........................ [NO][NO] .......sparse_attn ....... [OKAY]sparse_attn ............ [OKAY] ............ [NO] [NO]....... stochastic_transformer .......transformer[OKAY] [OKAY]............. transformer[NO][NO] transformer ................... ....... ............[NO] [OKAY] [OKAY]....... [NO] [OKAY]....... [OKAY]stochastic_transformer stochastic_transformer .stochastic_transformer. [NO].[NO] .......[NO] ....... [OKAY] ....... [OKAY][OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... [NO]............... .......[NO] .......[NO] [NO] transformer_inferencetransformer_inference .. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. [OKAY][OKAY] .................. [OKAY] -------------------------------------------------- --------------------------------------------------[OKAY]--------------------------------------------------op name op name................-------------------------------------------------- ................op nameinstalled op name installed .................................... compatibleinstalledcompatibleinstalled .. -------------------------------------------------- .. compatiblecompatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]cpu_adamcpu_adam cpu_adam .............................. [YES][YES] ............... ...... ...... [YES] [OKAY] [OKAY]fused_adam ................... [OKAY][NO] ....... [OKAY] fused_adamfused_adamfused_lamb ....................................... [NO][NO][NO] ..................... fused_adam [OKAY][OKAY] [OKAY] .............fused_lambfused_lamb [NO].......................... [NO][NO] sparse_attn ................................. [OKAY][OKAY][OKAY][NO] ....... [OKAY] fused_lamb ............. transformer[NO] ................... [OKAY]sparse_attn[NO] sparse_attn ............................... [NO][OKAY][NO] .............. [OKAY][OKAY] stochastic_transformersparse_attn transformertransformer . ........................ [NO]............ [NO]....... [NO] ....... [NO][OKAY] ....... [OKAY] [OKAY] stochastic_transformerstochastic_transformer ....... ..[OKAY] [NO][NO] ....... .......[OKAY] transformer [OKAY] ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... [NO] ....... [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference ..quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... quantizer[OKAY] .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch install path.................... 1.8.1............... torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version ..................... 11.2torch version ....................deepspeed install path 1.8.1........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-24 05:52:24,592] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.299 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 19.795 seconds time to initialize megatron (seconds): 12.702 [after megatron is initialized] datetime: 2021-09-24 05:52:44 building GPT model ... [2021-09-24 05:52:44,769] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-24 05:52:44,770] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-24 05:52:44,770] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.83 GB, percent = 20.2% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-24 05:52:46,176] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 [2021-09-24 05:52:47,386] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-24 05:52:47,387] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-24 05:52:47,388] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 38.02 GB, percent = 20.3% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-24 05:52:47,464] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-24 05:52:47,544] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-24 05:52:47,544] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-24 05:52:47,544] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-24 05:52:47,545] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-24 05:52:47,545] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-24 05:52:47,545] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-24 05:52:47,545] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-24 05:52:47,545] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-24 05:52:47,545] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-24 05:52:47,545] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-24 05:52:52,071] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-24 05:52:52,071] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-24 05:52:52,071] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-24 05:52:52,071] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-24 05:52:52,072] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-24 05:52:52,072] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] amp_params ................... False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] dump_state ................... False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] pld_params ................... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-24 05:52:52,074] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-24 05:52:52,074] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 171 successfully loaded 8 ZeRO state_dicts for rank 176 successfully loaded 8 ZeRO state_dicts for rank 88 successfully loaded 8 ZeRO state_dicts for rank 170 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 169 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 167 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 104 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 120 loading 8 zero partition checkpoints for rank 168 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 210 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 211 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 106 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 133 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 74 successfully loaded 8 ZeRO state_dicts for rank 34 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 200 successfully loaded 8 ZeRO state_dicts for rank 122 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 228 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 97 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 51 successfully loaded 8 ZeRO state_dicts for rank 77 successfully loaded 8 ZeRO state_dicts for rank 160 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 98 successfully loaded 8 ZeRO state_dicts for rank 20 successfully loaded 8 ZeRO state_dicts for rank 85 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 114 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 71 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 152 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 41 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 130 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 42 successfully loaded 8 ZeRO state_dicts for rank 190 successfully loaded 8 ZeRO state_dicts for rank 12 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 33 successfully loaded 8 ZeRO state_dicts for rank 56 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 24 successfully loaded 8 ZeRO state_dicts for rank 45 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 153 successfully loaded 8 ZeRO state_dicts for rank 134 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 38 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 121 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 05:53:20 CEST)" was missed by 0:00:03.058626 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 146 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 191 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 208 loading 8 zero partition checkpoints for rank 176 successfully loaded 8 ZeRO state_dicts for rank 65 successfully loaded 8 ZeRO state_dicts for rank 78 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 188 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 05:53:20 CEST)" was missed by 0:00:03.434951 successfully loaded 8 ZeRO state_dicts for rank 162 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 221 successfully loaded 8 ZeRO state_dicts for rank 107 successfully loaded 8 ZeRO state_dicts for rank 179 successfully loaded 8 ZeRO state_dicts for rank 147 successfully loaded 8 ZeRO state_dicts for rank 36 loading 8 zero partition checkpoints for rank 132 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 199 loading 8 zero partition checkpoints for rank 88 loading 8 zero partition checkpoints for rank 170 successfully loaded 8 ZeRO state_dicts for rank 151 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 13 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 218 successfully loaded 8 ZeRO state_dicts for rank 213 successfully loaded 8 ZeRO state_dicts for rank 119 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 164 loading 8 zero partition checkpoints for rank 159 successfully loaded 8 ZeRO state_dicts for rank 109 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 66 successfully loaded 8 ZeRO state_dicts for rank 22 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 196 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 205 successfully loaded 8 ZeRO state_dicts for rank 181 successfully loaded 8 ZeRO state_dicts for rank 25 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 29 successfully loaded 8 ZeRO state_dicts for rank 26 successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 28 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 53 successfully loaded 8 ZeRO state_dicts for rank 194 successfully loaded 8 ZeRO state_dicts for rank 54 successfully loaded 8 ZeRO state_dicts for rank 73 successfully loaded 8 ZeRO state_dicts for rank 21 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 46 successfully loaded 8 ZeRO state_dicts for rank 67 loading 8 zero partition checkpoints for rank 32 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 165 successfully loaded 8 ZeRO state_dicts for rank 118 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 57 successfully loaded 8 ZeRO state_dicts for rank 75 successfully loaded 8 ZeRO state_dicts for rank 0 successfully loaded 8 ZeRO state_dicts for rank 92 loading 8 zero partition checkpoints for rank 124 successfully loaded 8 ZeRO state_dicts for rank 94 successfully loaded 8 ZeRO state_dicts for rank 55 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 6 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 5 successfully loaded 8 ZeRO state_dicts for rank 117 successfully loaded 8 ZeRO state_dicts for rank 4 successfully loaded 8 ZeRO state_dicts for rank 30 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 1 successfully loaded 8 ZeRO state_dicts for rank 110 successfully loaded 8 ZeRO state_dicts for rank 58 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 101 successfully loaded 8 ZeRO state_dicts for rank 177 successfully loaded 8 ZeRO state_dicts for rank 2 loading 8 zero partition checkpoints for rank 167 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 227 loading 8 zero partition checkpoints for rank 171 successfully loaded 8 ZeRO state_dicts for rank 103 successfully loaded 8 ZeRO state_dicts for rank 142 loading 8 zero partition checkpoints for rank 96 successfully loaded 8 ZeRO state_dicts for rank 10 loading 8 zero partition checkpoints for rank 127 successfully loaded 8 ZeRO state_dicts for rank 31 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 3 successfully loaded 8 ZeRO state_dicts for rank 154 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 23 successfully loaded 8 ZeRO state_dicts for rank 15 loading 8 zero partition checkpoints for rank 148 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 14 successfully loaded 8 ZeRO state_dicts for rank 252 successfully loaded 8 ZeRO state_dicts for rank 236 successfully loaded 8 ZeRO state_dicts for rank 224 successfully loaded 8 ZeRO state_dicts for rank 183 loading 8 zero partition checkpoints for rank 144 successfully loaded 8 ZeRO state_dicts for rank 138 loading 8 zero partition checkpoints for rank 99 successfully loaded 8 ZeRO state_dicts for rank 230 loading 8 zero partition checkpoints for rank 120 successfully loaded 8 ZeRO state_dicts for rank 238 loading 8 zero partition checkpoints for rank 156 successfully loaded 8 ZeRO state_dicts for rank 226 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 231 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 246 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 239 successfully loaded 8 ZeRO state_dicts for rank 250 loading 8 zero partition checkpoints for rank 104 successfully loaded 8 ZeRO state_dicts for rank 242 successfully loaded 8 ZeRO state_dicts for rank 234 loading 8 zero partition checkpoints for rank 140 successfully loaded 8 ZeRO state_dicts for rank 240 loading 8 zero partition checkpoints for rank 193 successfully loaded 8 ZeRO state_dicts for rank 254 loading 8 zero partition checkpoints for rank 169 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 9 loading 8 zero partition checkpoints for rank 112 successfully loaded 8 ZeRO state_dicts for rank 7 successfully loaded 8 ZeRO state_dicts for rank 241 loading 8 zero partition checkpoints for rank 69 successfully loaded 8 ZeRO state_dicts for rank 237 successfully loaded 8 ZeRO state_dicts for rank 174 loading 8 zero partition checkpoints for rank 201 successfully loaded 8 ZeRO state_dicts for rank 229 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 235 successfully loaded 8 ZeRO state_dicts for rank 253 loading 8 zero partition checkpoints for rank 209 loading 8 zero partition checkpoints for rank 40 loading 8 zero partition checkpoints for rank 60 successfully loaded 8 ZeRO state_dicts for rank 225 loading 8 zero partition checkpoints for rank 80 successfully loaded 8 ZeRO state_dicts for rank 232 successfully loaded 8 ZeRO state_dicts for rank 255 successfully loaded 8 ZeRO state_dicts for rank 247 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 143 successfully loaded 8 ZeRO state_dicts for rank 251 successfully loaded 8 ZeRO state_dicts for rank 233 loading 8 zero partition checkpoints for rank 125 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 106 successfully loaded 8 ZeRO state_dicts for rank 245 loading 8 zero partition checkpoints for rank 137 loading 8 zero partition checkpoints for rank 81 successfully loaded 8 ZeRO state_dicts for rank 102 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 215 successfully loaded 8 ZeRO state_dicts for rank 249 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 105 loading 8 zero partition checkpoints for rank 64 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 160 loading 8 zero partition checkpoints for rank 216 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 139 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 114 loading 8 zero partition checkpoints for rank 152 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 108 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 206 loading 8 zero partition checkpoints for rank 33 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 135 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 62 loading 8 zero partition checkpoints for rank 134 successfully loaded 8 ZeRO state_dicts for rank 11 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 192 loading 8 zero partition checkpoints for rank 153 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 128 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 141 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 115 loading 8 zero partition checkpoints for rank 56 loading 8 zero partition checkpoints for rank 111 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 130 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 97 loading 8 zero partition checkpoints for rank 158 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 157 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 65 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 63 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 113 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 123 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 197 loading 8 zero partition checkpoints for rank 223 loading 8 zero partition checkpoints for rank 52 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 76 loading 8 zero partition checkpoints for rank 218 loading 8 zero partition checkpoints for rank 219 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 107 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 212 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 214 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 75 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 165 loading 8 zero partition checkpoints for rank 57 loading 8 zero partition checkpoints for rank 211 loading 8 zero partition checkpoints for rank 180 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 61 loading 8 zero partition checkpoints for rank 110 loading 8 zero partition checkpoints for rank 196 loading 8 zero partition checkpoints for rank 205 loading 8 zero partition checkpoints for rank 83 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 68 loading 8 zero partition checkpoints for rank 195 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 155 loading 8 zero partition checkpoints for rank 184 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 58 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 100 loading 8 zero partition checkpoints for rank 101 loading 8 zero partition checkpoints for rank 154 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 145 loading 8 zero partition checkpoints for rank 0 loading 8 zero partition checkpoints for rank 136 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 48 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 109 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 93 loading 8 zero partition checkpoints for rank 183 loading 8 zero partition checkpoints for rank 72 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 200 loading 8 zero partition checkpoints for rank 73 loading 8 zero partition checkpoints for rank 142 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 150 loading 8 zero partition checkpoints for rank 5 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 194 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 6 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 221 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 138 loading 8 zero partition checkpoints for rank 50 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 177 loading 8 zero partition checkpoints for rank 30 loading 8 zero partition checkpoints for rank 15 loading 8 zero partition checkpoints for rank 166 loading 8 zero partition checkpoints for rank 226 loading 8 zero partition checkpoints for rank 238 loading 8 zero partition checkpoints for rank 207 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 147 loading 8 zero partition checkpoints for rank 87 loading 8 zero partition checkpoints for rank 178 loading 8 zero partition checkpoints for rank 172 loading 8 zero partition checkpoints for rank 204 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 250 loading 8 zero partition checkpoints for rank 220 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 239 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 229 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 240 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 199 loading 8 zero partition checkpoints for rank 67 loading 8 zero partition checkpoints for rank 175 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 246 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 116 loading 8 zero partition checkpoints for rank 7 loading 8 zero partition checkpoints for rank 248 loading 8 zero partition checkpoints for rank 232 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 173 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 244 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 23 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 227 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 252 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 174 loading 8 zero partition checkpoints for rank 242 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 243 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 255 loading 8 zero partition checkpoints for rank 235 loading 8 zero partition checkpoints for rank 251 loading 8 zero partition checkpoints for rank 10 loading 8 zero partition checkpoints for rank 249 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 8 loading 8 zero partition checkpoints for rank 11 successfully loaded 8 ZeRO state_dicts for rank 17 successfully loaded 8 ZeRO state_dicts for rank 19 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 16 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 19 loading 8 zero partition checkpoints for rank 18 loading 8 zero partition checkpoints for rank 16 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 942 time (ms) | load-checkpoint: 82978.97 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-24 05:54:15 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.135933 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.348 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.321 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.062 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-24 05:54:21 done with setup ... training ... time (ms) | model-and-optimizer-setup: 91017.54 | train/valid/test-data-iterators-setup: 4740.91 [before the start of training step] datetime: 2021-09-24 05:54:21 [2021-09-24 05:54:21,235] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-24 05:54:21,235] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-24 05:54:21,235] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-24 05:54:21,235] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-24 05:54:21,235] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 1] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 22890.0 | max reserved: 22890.0 [Rank 225] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22108.0 | max reserved: 22108.0 [Rank 65] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 33] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 97] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 129] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 193] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 161] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 2] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 34] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 226] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 21700.0 | max reserved: 21700.0 [Rank 66] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 98] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 162] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 130] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18458.0 | max reserved: 18458.0 [Rank 194] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0 [Rank 0] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 23526.0 | max reserved: 23526.0 [Rank 32] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 64] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 224] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0 [Rank 96] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18948.0 | max reserved: 18948.0 [Rank 128] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 192] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19076.0 | max reserved: 19076.0 [Rank 160] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 3] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 35] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0 [Rank 227] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0 [Rank 67] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18458.0 | max reserved: 18458.0 [Rank 99] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0 [Rank 163] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 131] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 195] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0 iteration 943/ 159576 | consumed samples: 15088 | elapsed time per iteration (ms): 29806.1 | learning rate: 4.185E-06 | global batch size: 16 | lm loss: 7.642442E+00 | loss scale: 8192.0 | grad norm: 53639.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 944/ 159576 | consumed samples: 15104 | elapsed time per iteration (ms): 13012.2 | learning rate: 4.189E-06 | global batch size: 16 | lm loss: 7.638637E+00 | loss scale: 8192.0 | grad norm: 47002.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 945/ 159576 | consumed samples: 15120 | elapsed time per iteration (ms): 13551.8 | learning rate: 4.194E-06 | global batch size: 16 | lm loss: 7.559312E+00 | loss scale: 8192.0 | grad norm: 43680.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 946/ 159576 | consumed samples: 15136 | elapsed time per iteration (ms): 13672.0 | learning rate: 4.198E-06 | global batch size: 16 | lm loss: 7.372701E+00 | loss scale: 8192.0 | grad norm: 29642.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 947/ 159576 | consumed samples: 15152 | elapsed time per iteration (ms): 13523.5 | learning rate: 4.203E-06 | global batch size: 16 | lm loss: 7.431667E+00 | loss scale: 8192.0 | grad norm: 71525.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 948/ 159576 | consumed samples: 15168 | elapsed time per iteration (ms): 13571.1 | learning rate: 4.207E-06 | global batch size: 16 | lm loss: 7.622519E+00 | loss scale: 8192.0 | grad norm: 108314.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 949/ 159576 | consumed samples: 15184 | elapsed time per iteration (ms): 13513.7 | learning rate: 4.212E-06 | global batch size: 16 | lm loss: 7.491040E+00 | loss scale: 8192.0 | grad norm: 83775.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 950/ 159576 | consumed samples: 15200 | elapsed time per iteration (ms): 13857.2 | learning rate: 4.216E-06 | global batch size: 16 | lm loss: 7.689845E+00 | loss scale: 8192.0 | grad norm: 42694.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 951/ 159576 | consumed samples: 15216 | elapsed time per iteration (ms): 13556.0 | learning rate: 4.220E-06 | global batch size: 16 | lm loss: 7.541234E+00 | loss scale: 8192.0 | grad norm: 36744.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 952/ 159576 | consumed samples: 15232 | elapsed time per iteration (ms): 13565.0 | learning rate: 4.225E-06 | global batch size: 16 | lm loss: 7.402619E+00 | loss scale: 8192.0 | grad norm: 37335.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 953/ 159576 | consumed samples: 15248 | elapsed time per iteration (ms): 13600.8 | learning rate: 4.229E-06 | global batch size: 16 | lm loss: 7.524664E+00 | loss scale: 8192.0 | grad norm: 36490.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 954/ 159576 | consumed samples: 15264 | elapsed time per iteration (ms): 13538.1 | learning rate: 4.234E-06 | global batch size: 16 | lm loss: 6.926525E+00 | loss scale: 8192.0 | grad norm: 28573.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 955/ 159576 | consumed samples: 15280 | elapsed time per iteration (ms): 13767.3 | learning rate: 4.238E-06 | global batch size: 16 | lm loss: 7.564863E+00 | loss scale: 8192.0 | grad norm: 45556.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 956/ 159576 | consumed samples: 15296 | elapsed time per iteration (ms): 13529.6 | learning rate: 4.243E-06 | global batch size: 16 | lm loss: 7.518897E+00 | loss scale: 8192.0 | grad norm: 40483.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 957/ 159576 | consumed samples: 15312 | elapsed time per iteration (ms): 13548.2 | learning rate: 4.247E-06 | global batch size: 16 | lm loss: 7.292015E+00 | loss scale: 8192.0 | grad norm: 27123.950 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 958/ 159576 | consumed samples: 15328 | elapsed time per iteration (ms): 13592.2 | learning rate: 4.251E-06 | global batch size: 16 | lm loss: 7.645267E+00 | loss scale: 8192.0 | grad norm: 45895.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 959/ 159576 | consumed samples: 15344 | elapsed time per iteration (ms): 13834.7 | learning rate: 4.256E-06 | global batch size: 16 | lm loss: 7.439256E+00 | loss scale: 8192.0 | grad norm: 47827.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 960/ 159576 | consumed samples: 15360 | elapsed time per iteration (ms): 13548.7 | learning rate: 4.260E-06 | global batch size: 16 | lm loss: 7.398325E+00 | loss scale: 8192.0 | grad norm: 41514.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 961/ 159576 | consumed samples: 15376 | elapsed time per iteration (ms): 13540.1 | learning rate: 4.265E-06 | global batch size: 16 | lm loss: 7.498395E+00 | loss scale: 8192.0 | grad norm: 24323.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 962/ 159576 | consumed samples: 15392 | elapsed time per iteration (ms): 13596.3 | learning rate: 4.269E-06 | global batch size: 16 | lm loss: 7.458749E+00 | loss scale: 8192.0 | grad norm: 37806.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 963/ 159576 | consumed samples: 15408 | elapsed time per iteration (ms): 13925.1 | learning rate: 4.274E-06 | global batch size: 16 | lm loss: 7.414832E+00 | loss scale: 8192.0 | grad norm: 38291.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 964/ 159576 | consumed samples: 15424 | elapsed time per iteration (ms): 13505.9 | learning rate: 4.278E-06 | global batch size: 16 | lm loss: 7.552760E+00 | loss scale: 8192.0 | grad norm: 23290.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 965/ 159576 | consumed samples: 15440 | elapsed time per iteration (ms): 13598.7 | learning rate: 4.283E-06 | global batch size: 16 | lm loss: 7.566991E+00 | loss scale: 8192.0 | grad norm: 33429.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 966/ 159576 | consumed samples: 15456 | elapsed time per iteration (ms): 13495.5 | learning rate: 4.287E-06 | global batch size: 16 | lm loss: 7.727429E+00 | loss scale: 8192.0 | grad norm: 33196.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 967/ 159576 | consumed samples: 15472 | elapsed time per iteration (ms): 13508.3 | learning rate: 4.291E-06 | global batch size: 16 | lm loss: 7.517751E+00 | loss scale: 8192.0 | grad norm: 25674.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 968/ 159576 | consumed samples: 15488 | elapsed time per iteration (ms): 13747.8 | learning rate: 4.296E-06 | global batch size: 16 | lm loss: 7.534285E+00 | loss scale: 8192.0 | grad norm: 28899.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 969/ 159576 | consumed samples: 15504 | elapsed time per iteration (ms): 13541.9 | learning rate: 4.300E-06 | global batch size: 16 | lm loss: 7.412315E+00 | loss scale: 8192.0 | grad norm: 23856.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 970/ 159576 | consumed samples: 15520 | elapsed time per iteration (ms): 13581.6 | learning rate: 4.305E-06 | global batch size: 16 | lm loss: 7.574214E+00 | loss scale: 8192.0 | grad norm: 26912.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 971/ 159576 | consumed samples: 15536 | elapsed time per iteration (ms): 13575.2 | learning rate: 4.309E-06 | global batch size: 16 | lm loss: 7.489717E+00 | loss scale: 8192.0 | grad norm: 25683.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 972/ 159576 | consumed samples: 15552 | elapsed time per iteration (ms): 14047.8 | learning rate: 4.314E-06 | global batch size: 16 | lm loss: 7.479139E+00 | loss scale: 8192.0 | grad norm: 23963.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 973/ 159576 | consumed samples: 15568 | elapsed time per iteration (ms): 13519.1 | learning rate: 4.318E-06 | global batch size: 16 | lm loss: 7.557629E+00 | loss scale: 8192.0 | grad norm: 28281.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 974/ 159576 | consumed samples: 15584 | elapsed time per iteration (ms): 13508.3 | learning rate: 4.322E-06 | global batch size: 16 | lm loss: 7.324095E+00 | loss scale: 8192.0 | grad norm: 24628.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 975/ 159576 | consumed samples: 15600 | elapsed time per iteration (ms): 13557.4 | learning rate: 4.327E-06 | global batch size: 16 | lm loss: 7.551218E+00 | loss scale: 8192.0 | grad norm: 22604.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 976/ 159576 | consumed samples: 15616 | elapsed time per iteration (ms): 13573.2 | learning rate: 4.331E-06 | global batch size: 16 | lm loss: 7.421384E+00 | loss scale: 8192.0 | grad norm: 25754.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 977/ 159576 | consumed samples: 15632 | elapsed time per iteration (ms): 13891.1 | learning rate: 4.336E-06 | global batch size: 16 | lm loss: 7.421275E+00 | loss scale: 8192.0 | grad norm: 23427.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 978/ 159576 | consumed samples: 15648 | elapsed time per iteration (ms): 13578.3 | learning rate: 4.340E-06 | global batch size: 16 | lm loss: 7.468715E+00 | loss scale: 8192.0 | grad norm: 25697.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 979/ 159576 | consumed samples: 15664 | elapsed time per iteration (ms): 13602.5 | learning rate: 4.345E-06 | global batch size: 16 | lm loss: 7.679566E+00 | loss scale: 8192.0 | grad norm: 25403.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 980/ 159576 | consumed samples: 15680 | elapsed time per iteration (ms): 13628.8 | learning rate: 4.349E-06 | global batch size: 16 | lm loss: 7.442289E+00 | loss scale: 8192.0 | grad norm: 30230.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 981/ 159576 | consumed samples: 15696 | elapsed time per iteration (ms): 13812.5 | learning rate: 4.354E-06 | global batch size: 16 | lm loss: 7.521616E+00 | loss scale: 8192.0 | grad norm: 29030.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 982/ 159576 | consumed samples: 15712 | elapsed time per iteration (ms): 13617.0 | learning rate: 4.358E-06 | global batch size: 16 | lm loss: 7.595479E+00 | loss scale: 8192.0 | grad norm: 32518.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 06:03:44] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1162855_[2-10%1] on 'gpu_p13' partition) [2021-09-24 06:03:44] PULSE: tr8-104B is running for 11:33 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 983/ 159576 | consumed samples: 15728 | elapsed time per iteration (ms): 13560.9 | learning rate: 4.362E-06 | global batch size: 16 | lm loss: 7.437976E+00 | loss scale: 8192.0 | grad norm: 25658.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 984/ 159576 | consumed samples: 15744 | elapsed time per iteration (ms): 13555.5 | learning rate: 4.367E-06 | global batch size: 16 | lm loss: 7.561976E+00 | loss scale: 8192.0 | grad norm: 28146.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 985/ 159576 | consumed samples: 15760 | elapsed time per iteration (ms): 13993.9 | learning rate: 4.371E-06 | global batch size: 16 | lm loss: 7.526425E+00 | loss scale: 8192.0 | grad norm: 22789.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 986/ 159576 | consumed samples: 15776 | elapsed time per iteration (ms): 13819.4 | learning rate: 4.376E-06 | global batch size: 16 | lm loss: 7.568769E+00 | loss scale: 8192.0 | grad norm: 29742.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 987/ 159576 | consumed samples: 15792 | elapsed time per iteration (ms): 13655.7 | learning rate: 4.380E-06 | global batch size: 16 | lm loss: 7.516987E+00 | loss scale: 8192.0 | grad norm: 29352.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 988/ 159576 | consumed samples: 15808 | elapsed time per iteration (ms): 13528.1 | learning rate: 4.385E-06 | global batch size: 16 | lm loss: 7.482485E+00 | loss scale: 8192.0 | grad norm: 23020.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 989/ 159576 | consumed samples: 15824 | elapsed time per iteration (ms): 13534.2 | learning rate: 4.389E-06 | global batch size: 16 | lm loss: 7.601320E+00 | loss scale: 8192.0 | grad norm: 23202.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 990/ 159576 | consumed samples: 15840 | elapsed time per iteration (ms): 13617.6 | learning rate: 4.393E-06 | global batch size: 16 | lm loss: 7.522967E+00 | loss scale: 8192.0 | grad norm: 26298.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 991/ 159576 | consumed samples: 15856 | elapsed time per iteration (ms): 13569.7 | learning rate: 4.398E-06 | global batch size: 16 | lm loss: 7.564295E+00 | loss scale: 8192.0 | grad norm: 30127.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 992/ 159576 | consumed samples: 15872 | elapsed time per iteration (ms): 13596.4 | learning rate: 4.402E-06 | global batch size: 16 | lm loss: 7.530395E+00 | loss scale: 8192.0 | grad norm: 25061.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 993/ 159576 | consumed samples: 15888 | elapsed time per iteration (ms): 13641.4 | learning rate: 4.407E-06 | global batch size: 16 | lm loss: 7.547958E+00 | loss scale: 8192.0 | grad norm: 24314.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 994/ 159576 | consumed samples: 15904 | elapsed time per iteration (ms): 13912.4 | learning rate: 4.411E-06 | global batch size: 16 | lm loss: 7.429228E+00 | loss scale: 8192.0 | grad norm: 28339.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 995/ 159576 | consumed samples: 15920 | elapsed time per iteration (ms): 13541.6 | learning rate: 4.416E-06 | global batch size: 16 | lm loss: 7.511089E+00 | loss scale: 8192.0 | grad norm: 27156.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 996/ 159576 | consumed samples: 15936 | elapsed time per iteration (ms): 13577.4 | learning rate: 4.420E-06 | global batch size: 16 | lm loss: 7.332575E+00 | loss scale: 8192.0 | grad norm: 26750.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 997/ 159576 | consumed samples: 15952 | elapsed time per iteration (ms): 13524.5 | learning rate: 4.425E-06 | global batch size: 16 | lm loss: 7.478838E+00 | loss scale: 8192.0 | grad norm: 30934.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 998/ 159576 | consumed samples: 15968 | elapsed time per iteration (ms): 13570.2 | learning rate: 4.429E-06 | global batch size: 16 | lm loss: 7.363966E+00 | loss scale: 8192.0 | grad norm: 26717.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 999/ 159576 | consumed samples: 15984 | elapsed time per iteration (ms): 13808.8 | learning rate: 4.433E-06 | global batch size: 16 | lm loss: 7.504936E+00 | loss scale: 8192.0 | grad norm: 33504.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1000/ 159576 | consumed samples: 16000 | elapsed time per iteration (ms): 13740.5 | learning rate: 4.438E-06 | global batch size: 16 | lm loss: 7.441235E+00 | loss scale: 16384.0 | grad norm: 39922.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 1000 | lm loss value: 7.422922E+00 | lm loss PPL: 1.673917E+03 | ------------------------------------------------------------------------------------------------ iteration 1001/ 159576 | consumed samples: 16016 | elapsed time per iteration (ms): 18607.4 | learning rate: 4.442E-06 | global batch size: 16 | lm loss: 7.375732E+00 | loss scale: 16384.0 | grad norm: 55247.055 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1002/ 159576 | consumed samples: 16032 | elapsed time per iteration (ms): 13593.5 | learning rate: 4.447E-06 | global batch size: 16 | lm loss: 7.377642E+00 | loss scale: 16384.0 | grad norm: 69178.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1003/ 159576 | consumed samples: 16048 | elapsed time per iteration (ms): 13772.4 | learning rate: 4.451E-06 | global batch size: 16 | lm loss: 7.399412E+00 | loss scale: 16384.0 | grad norm: 56841.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1004/ 159576 | consumed samples: 16064 | elapsed time per iteration (ms): 13547.9 | learning rate: 4.456E-06 | global batch size: 16 | lm loss: 7.476449E+00 | loss scale: 16384.0 | grad norm: 53109.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1005/ 159576 | consumed samples: 16080 | elapsed time per iteration (ms): 13546.4 | learning rate: 4.460E-06 | global batch size: 16 | lm loss: 7.394112E+00 | loss scale: 16384.0 | grad norm: 62368.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1006/ 159576 | consumed samples: 16096 | elapsed time per iteration (ms): 13685.8 | learning rate: 4.464E-06 | global batch size: 16 | lm loss: 7.426886E+00 | loss scale: 16384.0 | grad norm: 57003.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1007/ 159576 | consumed samples: 16112 | elapsed time per iteration (ms): 14078.3 | learning rate: 4.469E-06 | global batch size: 16 | lm loss: 7.601004E+00 | loss scale: 16384.0 | grad norm: 62664.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1008/ 159576 | consumed samples: 16128 | elapsed time per iteration (ms): 13787.6 | learning rate: 4.473E-06 | global batch size: 16 | lm loss: 7.774883E+00 | loss scale: 16384.0 | grad norm: 97296.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1009/ 159576 | consumed samples: 16144 | elapsed time per iteration (ms): 13687.7 | learning rate: 4.478E-06 | global batch size: 16 | lm loss: 7.604346E+00 | loss scale: 16384.0 | grad norm: 65941.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1010/ 159576 | consumed samples: 16160 | elapsed time per iteration (ms): 13703.4 | learning rate: 4.482E-06 | global batch size: 16 | lm loss: 7.360181E+00 | loss scale: 16384.0 | grad norm: 64245.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1011/ 159576 | consumed samples: 16176 | elapsed time per iteration (ms): 14077.4 | learning rate: 4.487E-06 | global batch size: 16 | lm loss: 7.590093E+00 | loss scale: 16384.0 | grad norm: 66963.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1012/ 159576 | consumed samples: 16192 | elapsed time per iteration (ms): 13697.2 | learning rate: 4.491E-06 | global batch size: 16 | lm loss: 7.648331E+00 | loss scale: 16384.0 | grad norm: 62407.028 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1013/ 159576 | consumed samples: 16208 | elapsed time per iteration (ms): 13676.8 | learning rate: 4.496E-06 | global batch size: 16 | lm loss: 7.462048E+00 | loss scale: 16384.0 | grad norm: 76557.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1014/ 159576 | consumed samples: 16224 | elapsed time per iteration (ms): 13713.9 | learning rate: 4.500E-06 | global batch size: 16 | lm loss: 7.345057E+00 | loss scale: 16384.0 | grad norm: 58991.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1015/ 159576 | consumed samples: 16240 | elapsed time per iteration (ms): 13740.6 | learning rate: 4.504E-06 | global batch size: 16 | lm loss: 7.369339E+00 | loss scale: 16384.0 | grad norm: 76798.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1016/ 159576 | consumed samples: 16256 | elapsed time per iteration (ms): 13921.9 | learning rate: 4.509E-06 | global batch size: 16 | lm loss: 7.564117E+00 | loss scale: 16384.0 | grad norm: 64166.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1017/ 159576 | consumed samples: 16272 | elapsed time per iteration (ms): 13632.9 | learning rate: 4.513E-06 | global batch size: 16 | lm loss: 7.610378E+00 | loss scale: 16384.0 | grad norm: 65353.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1018/ 159576 | consumed samples: 16288 | elapsed time per iteration (ms): 13686.4 | learning rate: 4.518E-06 | global batch size: 16 | lm loss: 7.676594E+00 | loss scale: 16384.0 | grad norm: 64547.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1019/ 159576 | consumed samples: 16304 | elapsed time per iteration (ms): 13717.6 | learning rate: 4.522E-06 | global batch size: 16 | lm loss: 7.406422E+00 | loss scale: 16384.0 | grad norm: 63594.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1020/ 159576 | consumed samples: 16320 | elapsed time per iteration (ms): 13939.6 | learning rate: 4.527E-06 | global batch size: 16 | lm loss: 7.459125E+00 | loss scale: 16384.0 | grad norm: 59823.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1021/ 159576 | consumed samples: 16336 | elapsed time per iteration (ms): 13792.3 | learning rate: 4.531E-06 | global batch size: 16 | lm loss: 7.471806E+00 | loss scale: 16384.0 | grad norm: 56872.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1022/ 159576 | consumed samples: 16352 | elapsed time per iteration (ms): 13687.8 | learning rate: 4.536E-06 | global batch size: 16 | lm loss: 7.110139E+00 | loss scale: 16384.0 | grad norm: 58937.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1023/ 159576 | consumed samples: 16368 | elapsed time per iteration (ms): 13711.6 | learning rate: 4.540E-06 | global batch size: 16 | lm loss: 7.428498E+00 | loss scale: 16384.0 | grad norm: 57885.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1024/ 159576 | consumed samples: 16384 | elapsed time per iteration (ms): 14207.9 | learning rate: 4.544E-06 | global batch size: 16 | lm loss: 7.374810E+00 | loss scale: 16384.0 | grad norm: 56855.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1025/ 159576 | consumed samples: 16400 | elapsed time per iteration (ms): 13557.2 | learning rate: 4.549E-06 | global batch size: 16 | lm loss: 7.597025E+00 | loss scale: 16384.0 | grad norm: 57119.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1026/ 159576 | consumed samples: 16416 | elapsed time per iteration (ms): 13700.8 | learning rate: 4.553E-06 | global batch size: 16 | lm loss: 7.473170E+00 | loss scale: 16384.0 | grad norm: 61762.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1027/ 159576 | consumed samples: 16432 | elapsed time per iteration (ms): 13696.5 | learning rate: 4.558E-06 | global batch size: 16 | lm loss: 7.410631E+00 | loss scale: 16384.0 | grad norm: 63393.977 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1028/ 159576 | consumed samples: 16448 | elapsed time per iteration (ms): 13664.5 | learning rate: 4.562E-06 | global batch size: 16 | lm loss: 7.475993E+00 | loss scale: 16384.0 | grad norm: 61819.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1029/ 159576 | consumed samples: 16464 | elapsed time per iteration (ms): 13836.3 | learning rate: 4.567E-06 | global batch size: 16 | lm loss: 7.464800E+00 | loss scale: 16384.0 | grad norm: 52336.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1030/ 159576 | consumed samples: 16480 | elapsed time per iteration (ms): 13692.5 | learning rate: 4.571E-06 | global batch size: 16 | lm loss: 7.449406E+00 | loss scale: 16384.0 | grad norm: 66491.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1031/ 159576 | consumed samples: 16496 | elapsed time per iteration (ms): 13635.2 | learning rate: 4.575E-06 | global batch size: 16 | lm loss: 7.519850E+00 | loss scale: 16384.0 | grad norm: 65780.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1032/ 159576 | consumed samples: 16512 | elapsed time per iteration (ms): 13708.9 | learning rate: 4.580E-06 | global batch size: 16 | lm loss: 7.513804E+00 | loss scale: 16384.0 | grad norm: 62434.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1033/ 159576 | consumed samples: 16528 | elapsed time per iteration (ms): 13952.8 | learning rate: 4.584E-06 | global batch size: 16 | lm loss: 7.405169E+00 | loss scale: 16384.0 | grad norm: 74264.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1034/ 159576 | consumed samples: 16544 | elapsed time per iteration (ms): 13788.4 | learning rate: 4.589E-06 | global batch size: 16 | lm loss: 7.367761E+00 | loss scale: 16384.0 | grad norm: 75791.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1035/ 159576 | consumed samples: 16560 | elapsed time per iteration (ms): 13716.5 | learning rate: 4.593E-06 | global batch size: 16 | lm loss: 7.513783E+00 | loss scale: 16384.0 | grad norm: 91765.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1036/ 159576 | consumed samples: 16576 | elapsed time per iteration (ms): 13658.1 | learning rate: 4.598E-06 | global batch size: 16 | lm loss: 7.556536E+00 | loss scale: 16384.0 | grad norm: 76354.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1037/ 159576 | consumed samples: 16592 | elapsed time per iteration (ms): 13995.5 | learning rate: 4.602E-06 | global batch size: 16 | lm loss: 7.423755E+00 | loss scale: 16384.0 | grad norm: 70528.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1038/ 159576 | consumed samples: 16608 | elapsed time per iteration (ms): 13797.2 | learning rate: 4.607E-06 | global batch size: 16 | lm loss: 7.452043E+00 | loss scale: 16384.0 | grad norm: 63200.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1039/ 159576 | consumed samples: 16624 | elapsed time per iteration (ms): 13728.6 | learning rate: 4.611E-06 | global batch size: 16 | lm loss: 7.310857E+00 | loss scale: 16384.0 | grad norm: 135045.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1040/ 159576 | consumed samples: 16640 | elapsed time per iteration (ms): 13690.2 | learning rate: 4.615E-06 | global batch size: 16 | lm loss: 7.374257E+00 | loss scale: 16384.0 | grad norm: 69159.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1041/ 159576 | consumed samples: 16656 | elapsed time per iteration (ms): 13682.9 | learning rate: 4.620E-06 | global batch size: 16 | lm loss: 7.498551E+00 | loss scale: 16384.0 | grad norm: 67982.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1042/ 159576 | consumed samples: 16672 | elapsed time per iteration (ms): 13991.8 | learning rate: 4.624E-06 | global batch size: 16 | lm loss: 7.373695E+00 | loss scale: 16384.0 | grad norm: 75175.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1043/ 159576 | consumed samples: 16688 | elapsed time per iteration (ms): 13721.4 | learning rate: 4.629E-06 | global batch size: 16 | lm loss: 7.642927E+00 | loss scale: 16384.0 | grad norm: 103318.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1044/ 159576 | consumed samples: 16704 | elapsed time per iteration (ms): 13718.3 | learning rate: 4.633E-06 | global batch size: 16 | lm loss: 7.423826E+00 | loss scale: 16384.0 | grad norm: 71060.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1045/ 159576 | consumed samples: 16720 | elapsed time per iteration (ms): 13604.4 | learning rate: 4.638E-06 | global batch size: 16 | lm loss: 7.362212E+00 | loss scale: 16384.0 | grad norm: 81169.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1046/ 159576 | consumed samples: 16736 | elapsed time per iteration (ms): 14075.1 | learning rate: 4.642E-06 | global batch size: 16 | lm loss: 7.450203E+00 | loss scale: 16384.0 | grad norm: 83510.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1047/ 159576 | consumed samples: 16752 | elapsed time per iteration (ms): 13677.3 | learning rate: 4.646E-06 | global batch size: 16 | lm loss: 7.554290E+00 | loss scale: 16384.0 | grad norm: 81988.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1048/ 159576 | consumed samples: 16768 | elapsed time per iteration (ms): 13606.4 | learning rate: 4.651E-06 | global batch size: 16 | lm loss: 7.327914E+00 | loss scale: 16384.0 | grad norm: 71618.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1049/ 159576 | consumed samples: 16784 | elapsed time per iteration (ms): 13669.1 | learning rate: 4.655E-06 | global batch size: 16 | lm loss: 7.596028E+00 | loss scale: 16384.0 | grad norm: 76665.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1050/ 159576 | consumed samples: 16800 | elapsed time per iteration (ms): 13708.7 | learning rate: 4.660E-06 | global batch size: 16 | lm loss: 7.326102E+00 | loss scale: 16384.0 | grad norm: 83331.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1051/ 159576 | consumed samples: 16816 | elapsed time per iteration (ms): 13981.1 | learning rate: 4.664E-06 | global batch size: 16 | lm loss: 7.619492E+00 | loss scale: 16384.0 | grad norm: 82397.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1052/ 159576 | consumed samples: 16832 | elapsed time per iteration (ms): 13516.4 | learning rate: 4.669E-06 | global batch size: 16 | lm loss: 7.530663E+00 | loss scale: 16384.0 | grad norm: 56319.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1053/ 159576 | consumed samples: 16848 | elapsed time per iteration (ms): 13647.6 | learning rate: 4.673E-06 | global batch size: 16 | lm loss: 7.443875E+00 | loss scale: 16384.0 | grad norm: 72562.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1054/ 159576 | consumed samples: 16864 | elapsed time per iteration (ms): 13627.5 | learning rate: 4.678E-06 | global batch size: 16 | lm loss: 7.479875E+00 | loss scale: 16384.0 | grad norm: 61495.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1055/ 159576 | consumed samples: 16880 | elapsed time per iteration (ms): 14065.0 | learning rate: 4.682E-06 | global batch size: 16 | lm loss: 7.612121E+00 | loss scale: 16384.0 | grad norm: 112310.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1056/ 159576 | consumed samples: 16896 | elapsed time per iteration (ms): 13707.4 | learning rate: 4.686E-06 | global batch size: 16 | lm loss: 7.408166E+00 | loss scale: 16384.0 | grad norm: 92018.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1057/ 159576 | consumed samples: 16912 | elapsed time per iteration (ms): 13656.1 | learning rate: 4.691E-06 | global batch size: 16 | lm loss: 7.422934E+00 | loss scale: 16384.0 | grad norm: 67279.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1058/ 159576 | consumed samples: 16928 | elapsed time per iteration (ms): 13676.8 | learning rate: 4.695E-06 | global batch size: 16 | lm loss: 7.397638E+00 | loss scale: 16384.0 | grad norm: 87601.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1059/ 159576 | consumed samples: 16944 | elapsed time per iteration (ms): 14053.0 | learning rate: 4.700E-06 | global batch size: 16 | lm loss: 7.514566E+00 | loss scale: 16384.0 | grad norm: 115639.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1060/ 159576 | consumed samples: 16960 | elapsed time per iteration (ms): 13722.6 | learning rate: 4.704E-06 | global batch size: 16 | lm loss: 7.310302E+00 | loss scale: 16384.0 | grad norm: 142865.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1061/ 159576 | consumed samples: 16976 | elapsed time per iteration (ms): 13679.9 | learning rate: 4.709E-06 | global batch size: 16 | lm loss: 7.399222E+00 | loss scale: 16384.0 | grad norm: 100646.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1062/ 159576 | consumed samples: 16992 | elapsed time per iteration (ms): 13634.5 | learning rate: 4.713E-06 | global batch size: 16 | lm loss: 7.332808E+00 | loss scale: 16384.0 | grad norm: 66218.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1063/ 159576 | consumed samples: 17008 | elapsed time per iteration (ms): 13663.6 | learning rate: 4.717E-06 | global batch size: 16 | lm loss: 7.490856E+00 | loss scale: 16384.0 | grad norm: 127442.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1064/ 159576 | consumed samples: 17024 | elapsed time per iteration (ms): 13909.0 | learning rate: 4.722E-06 | global batch size: 16 | lm loss: 7.693977E+00 | loss scale: 16384.0 | grad norm: 101533.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1065/ 159576 | consumed samples: 17040 | elapsed time per iteration (ms): 13658.8 | learning rate: 4.726E-06 | global batch size: 16 | lm loss: 7.565272E+00 | loss scale: 16384.0 | grad norm: 87035.171 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1066/ 159576 | consumed samples: 17056 | elapsed time per iteration (ms): 13679.2 | learning rate: 4.731E-06 | global batch size: 16 | lm loss: 7.790638E+00 | loss scale: 16384.0 | grad norm: 86411.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1067/ 159576 | consumed samples: 17072 | elapsed time per iteration (ms): 13759.2 | learning rate: 4.735E-06 | global batch size: 16 | lm loss: 7.438931E+00 | loss scale: 16384.0 | grad norm: 65756.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1068/ 159576 | consumed samples: 17088 | elapsed time per iteration (ms): 14138.1 | learning rate: 4.740E-06 | global batch size: 16 | lm loss: 7.361547E+00 | loss scale: 16384.0 | grad norm: 130711.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1069/ 159576 | consumed samples: 17104 | elapsed time per iteration (ms): 13687.8 | learning rate: 4.744E-06 | global batch size: 16 | lm loss: 7.413251E+00 | loss scale: 16384.0 | grad norm: 58324.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1070/ 159576 | consumed samples: 17120 | elapsed time per iteration (ms): 13637.9 | learning rate: 4.749E-06 | global batch size: 16 | lm loss: 7.397507E+00 | loss scale: 16384.0 | grad norm: 89260.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1071/ 159576 | consumed samples: 17136 | elapsed time per iteration (ms): 13680.2 | learning rate: 4.753E-06 | global batch size: 16 | lm loss: 7.535676E+00 | loss scale: 16384.0 | grad norm: 74408.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1072/ 159576 | consumed samples: 17152 | elapsed time per iteration (ms): 14062.2 | learning rate: 4.757E-06 | global batch size: 16 | lm loss: 7.411667E+00 | loss scale: 16384.0 | grad norm: 77225.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1073/ 159576 | consumed samples: 17168 | elapsed time per iteration (ms): 13681.2 | learning rate: 4.762E-06 | global batch size: 16 | lm loss: 7.394706E+00 | loss scale: 16384.0 | grad norm: 78590.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1074/ 159576 | consumed samples: 17184 | elapsed time per iteration (ms): 13709.1 | learning rate: 4.766E-06 | global batch size: 16 | lm loss: 7.616404E+00 | loss scale: 16384.0 | grad norm: 82722.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1075/ 159576 | consumed samples: 17200 | elapsed time per iteration (ms): 13743.2 | learning rate: 4.771E-06 | global batch size: 16 | lm loss: 7.395072E+00 | loss scale: 16384.0 | grad norm: 63549.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1076/ 159576 | consumed samples: 17216 | elapsed time per iteration (ms): 13619.1 | learning rate: 4.775E-06 | global batch size: 16 | lm loss: 7.593513E+00 | loss scale: 16384.0 | grad norm: 100985.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1077/ 159576 | consumed samples: 17232 | elapsed time per iteration (ms): 13859.6 | learning rate: 4.780E-06 | global batch size: 16 | lm loss: 7.379070E+00 | loss scale: 16384.0 | grad norm: 56935.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1078/ 159576 | consumed samples: 17248 | elapsed time per iteration (ms): 13589.7 | learning rate: 4.784E-06 | global batch size: 16 | lm loss: 7.412032E+00 | loss scale: 16384.0 | grad norm: 93391.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1079/ 159576 | consumed samples: 17264 | elapsed time per iteration (ms): 13575.0 | learning rate: 4.788E-06 | global batch size: 16 | lm loss: 7.485137E+00 | loss scale: 16384.0 | grad norm: 70759.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1080/ 159576 | consumed samples: 17280 | elapsed time per iteration (ms): 13590.9 | learning rate: 4.793E-06 | global batch size: 16 | lm loss: 7.410018E+00 | loss scale: 16384.0 | grad norm: 108070.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1081/ 159576 | consumed samples: 17296 | elapsed time per iteration (ms): 13934.8 | learning rate: 4.797E-06 | global batch size: 16 | lm loss: 7.444709E+00 | loss scale: 16384.0 | grad norm: 93912.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1082/ 159576 | consumed samples: 17312 | elapsed time per iteration (ms): 13598.4 | learning rate: 4.802E-06 | global batch size: 16 | lm loss: 7.532929E+00 | loss scale: 16384.0 | grad norm: 76683.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1083/ 159576 | consumed samples: 17328 | elapsed time per iteration (ms): 13510.5 | learning rate: 4.806E-06 | global batch size: 16 | lm loss: 7.599612E+00 | loss scale: 16384.0 | grad norm: 83858.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1084/ 159576 | consumed samples: 17344 | elapsed time per iteration (ms): 13542.7 | learning rate: 4.811E-06 | global batch size: 16 | lm loss: 7.387773E+00 | loss scale: 16384.0 | grad norm: 63120.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1085/ 159576 | consumed samples: 17360 | elapsed time per iteration (ms): 13555.5 | learning rate: 4.815E-06 | global batch size: 16 | lm loss: 7.289794E+00 | loss scale: 16384.0 | grad norm: 77022.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1086/ 159576 | consumed samples: 17376 | elapsed time per iteration (ms): 13932.5 | learning rate: 4.820E-06 | global batch size: 16 | lm loss: 7.393349E+00 | loss scale: 16384.0 | grad norm: 79433.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1087/ 159576 | consumed samples: 17392 | elapsed time per iteration (ms): 13479.9 | learning rate: 4.824E-06 | global batch size: 16 | lm loss: 7.321753E+00 | loss scale: 16384.0 | grad norm: 68970.976 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1088/ 159576 | consumed samples: 17408 | elapsed time per iteration (ms): 13681.0 | learning rate: 4.828E-06 | global batch size: 16 | lm loss: 7.320374E+00 | loss scale: 16384.0 | grad norm: 73549.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1089/ 159576 | consumed samples: 17424 | elapsed time per iteration (ms): 13654.0 | learning rate: 4.833E-06 | global batch size: 16 | lm loss: 7.605762E+00 | loss scale: 16384.0 | grad norm: 80374.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1090/ 159576 | consumed samples: 17440 | elapsed time per iteration (ms): 14059.3 | learning rate: 4.837E-06 | global batch size: 16 | lm loss: 7.631133E+00 | loss scale: 16384.0 | grad norm: 82954.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1091/ 159576 | consumed samples: 17456 | elapsed time per iteration (ms): 13724.8 | learning rate: 4.842E-06 | global batch size: 16 | lm loss: 7.507143E+00 | loss scale: 16384.0 | grad norm: 60066.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1092/ 159576 | consumed samples: 17472 | elapsed time per iteration (ms): 13461.4 | learning rate: 4.846E-06 | global batch size: 16 | lm loss: 7.300464E+00 | loss scale: 16384.0 | grad norm: 116487.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1093/ 159576 | consumed samples: 17488 | elapsed time per iteration (ms): 13525.0 | learning rate: 4.851E-06 | global batch size: 16 | lm loss: 7.388405E+00 | loss scale: 16384.0 | grad norm: 79147.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1094/ 159576 | consumed samples: 17504 | elapsed time per iteration (ms): 13950.4 | learning rate: 4.855E-06 | global batch size: 16 | lm loss: 7.471725E+00 | loss scale: 16384.0 | grad norm: 90987.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1095/ 159576 | consumed samples: 17520 | elapsed time per iteration (ms): 13624.6 | learning rate: 4.859E-06 | global batch size: 16 | lm loss: 7.530853E+00 | loss scale: 16384.0 | grad norm: 90057.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1096/ 159576 | consumed samples: 17536 | elapsed time per iteration (ms): 13591.9 | learning rate: 4.864E-06 | global batch size: 16 | lm loss: 7.420722E+00 | loss scale: 16384.0 | grad norm: 76037.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1097/ 159576 | consumed samples: 17552 | elapsed time per iteration (ms): 13587.0 | learning rate: 4.868E-06 | global batch size: 16 | lm loss: 7.363769E+00 | loss scale: 16384.0 | grad norm: 107388.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1098/ 159576 | consumed samples: 17568 | elapsed time per iteration (ms): 13667.8 | learning rate: 4.873E-06 | global batch size: 16 | lm loss: 7.310038E+00 | loss scale: 16384.0 | grad norm: 72408.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1099/ 159576 | consumed samples: 17584 | elapsed time per iteration (ms): 13707.4 | learning rate: 4.877E-06 | global batch size: 16 | lm loss: 7.291698E+00 | loss scale: 16384.0 | grad norm: 69292.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1100/ 159576 | consumed samples: 17600 | elapsed time per iteration (ms): 13564.5 | learning rate: 4.882E-06 | global batch size: 16 | lm loss: 7.713614E+00 | loss scale: 16384.0 | grad norm: 87150.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1101/ 159576 | consumed samples: 17616 | elapsed time per iteration (ms): 13621.9 | learning rate: 4.886E-06 | global batch size: 16 | lm loss: 7.482057E+00 | loss scale: 16384.0 | grad norm: 61713.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1102/ 159576 | consumed samples: 17632 | elapsed time per iteration (ms): 13628.2 | learning rate: 4.891E-06 | global batch size: 16 | lm loss: 7.370234E+00 | loss scale: 16384.0 | grad norm: 83708.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1103/ 159576 | consumed samples: 17648 | elapsed time per iteration (ms): 13962.7 | learning rate: 4.895E-06 | global batch size: 16 | lm loss: 7.373138E+00 | loss scale: 16384.0 | grad norm: 75905.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1104/ 159576 | consumed samples: 17664 | elapsed time per iteration (ms): 13627.3 | learning rate: 4.899E-06 | global batch size: 16 | lm loss: 7.448909E+00 | loss scale: 16384.0 | grad norm: 135141.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1105/ 159576 | consumed samples: 17680 | elapsed time per iteration (ms): 13640.6 | learning rate: 4.904E-06 | global batch size: 16 | lm loss: 7.252520E+00 | loss scale: 16384.0 | grad norm: 73661.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1106/ 159576 | consumed samples: 17696 | elapsed time per iteration (ms): 13666.3 | learning rate: 4.908E-06 | global batch size: 16 | lm loss: 7.507257E+00 | loss scale: 16384.0 | grad norm: 108098.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1107/ 159576 | consumed samples: 17712 | elapsed time per iteration (ms): 13849.3 | learning rate: 4.913E-06 | global batch size: 16 | lm loss: 7.429738E+00 | loss scale: 16384.0 | grad norm: 99851.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1108/ 159576 | consumed samples: 17728 | elapsed time per iteration (ms): 13862.9 | learning rate: 4.917E-06 | global batch size: 16 | lm loss: 7.422798E+00 | loss scale: 16384.0 | grad norm: 90788.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1109/ 159576 | consumed samples: 17744 | elapsed time per iteration (ms): 13640.2 | learning rate: 4.922E-06 | global batch size: 16 | lm loss: 7.656183E+00 | loss scale: 16384.0 | grad norm: 204462.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1110/ 159576 | consumed samples: 17760 | elapsed time per iteration (ms): 13627.0 | learning rate: 4.926E-06 | global batch size: 16 | lm loss: 7.576304E+00 | loss scale: 16384.0 | grad norm: 166002.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1111/ 159576 | consumed samples: 17776 | elapsed time per iteration (ms): 13632.9 | learning rate: 4.930E-06 | global batch size: 16 | lm loss: 7.626440E+00 | loss scale: 16384.0 | grad norm: 82466.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1112/ 159576 | consumed samples: 17792 | elapsed time per iteration (ms): 13939.0 | learning rate: 4.935E-06 | global batch size: 16 | lm loss: 7.302793E+00 | loss scale: 16384.0 | grad norm: 150100.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1113/ 159576 | consumed samples: 17808 | elapsed time per iteration (ms): 13640.4 | learning rate: 4.939E-06 | global batch size: 16 | lm loss: 7.493092E+00 | loss scale: 16384.0 | grad norm: 104956.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1114/ 159576 | consumed samples: 17824 | elapsed time per iteration (ms): 13637.6 | learning rate: 4.944E-06 | global batch size: 16 | lm loss: 7.475542E+00 | loss scale: 16384.0 | grad norm: 86316.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1115/ 159576 | consumed samples: 17840 | elapsed time per iteration (ms): 13630.5 | learning rate: 4.948E-06 | global batch size: 16 | lm loss: 7.367518E+00 | loss scale: 16384.0 | grad norm: 127229.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1116/ 159576 | consumed samples: 17856 | elapsed time per iteration (ms): 13929.1 | learning rate: 4.953E-06 | global batch size: 16 | lm loss: 7.463512E+00 | loss scale: 16384.0 | grad norm: 80765.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1117/ 159576 | consumed samples: 17872 | elapsed time per iteration (ms): 13651.9 | learning rate: 4.957E-06 | global batch size: 16 | lm loss: 7.389682E+00 | loss scale: 16384.0 | grad norm: 114274.057 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1118/ 159576 | consumed samples: 17888 | elapsed time per iteration (ms): 13673.8 | learning rate: 4.962E-06 | global batch size: 16 | lm loss: 7.446970E+00 | loss scale: 16384.0 | grad norm: 93011.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1119/ 159576 | consumed samples: 17904 | elapsed time per iteration (ms): 13700.2 | learning rate: 4.966E-06 | global batch size: 16 | lm loss: 7.314221E+00 | loss scale: 16384.0 | grad norm: 105575.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1120/ 159576 | consumed samples: 17920 | elapsed time per iteration (ms): 13702.7 | learning rate: 4.970E-06 | global batch size: 16 | lm loss: 7.372279E+00 | loss scale: 16384.0 | grad norm: 77507.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1121/ 159576 | consumed samples: 17936 | elapsed time per iteration (ms): 13869.6 | learning rate: 4.975E-06 | global batch size: 16 | lm loss: 7.535093E+00 | loss scale: 16384.0 | grad norm: 98620.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1122/ 159576 | consumed samples: 17952 | elapsed time per iteration (ms): 13679.6 | learning rate: 4.979E-06 | global batch size: 16 | lm loss: 8.079200E+00 | loss scale: 16384.0 | grad norm: 187332.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1123/ 159576 | consumed samples: 17968 | elapsed time per iteration (ms): 13672.8 | learning rate: 4.984E-06 | global batch size: 16 | lm loss: 7.433456E+00 | loss scale: 16384.0 | grad norm: 139834.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1124/ 159576 | consumed samples: 17984 | elapsed time per iteration (ms): 13651.7 | learning rate: 4.988E-06 | global batch size: 16 | lm loss: 7.440439E+00 | loss scale: 16384.0 | grad norm: 91486.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1125/ 159576 | consumed samples: 18000 | elapsed time per iteration (ms): 14085.1 | learning rate: 4.993E-06 | global batch size: 16 | lm loss: 7.453449E+00 | loss scale: 16384.0 | grad norm: 170685.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1126/ 159576 | consumed samples: 18016 | elapsed time per iteration (ms): 13744.0 | learning rate: 4.997E-06 | global batch size: 16 | lm loss: 7.544756E+00 | loss scale: 16384.0 | grad norm: 93482.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1127/ 159576 | consumed samples: 18032 | elapsed time per iteration (ms): 13666.9 | learning rate: 5.001E-06 | global batch size: 16 | lm loss: 7.435877E+00 | loss scale: 16384.0 | grad norm: 98259.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1128/ 159576 | consumed samples: 18048 | elapsed time per iteration (ms): 13692.7 | learning rate: 5.006E-06 | global batch size: 16 | lm loss: 7.496342E+00 | loss scale: 16384.0 | grad norm: 130279.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1129/ 159576 | consumed samples: 18064 | elapsed time per iteration (ms): 14100.4 | learning rate: 5.010E-06 | global batch size: 16 | lm loss: 7.501980E+00 | loss scale: 16384.0 | grad norm: 88561.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1130/ 159576 | consumed samples: 18080 | elapsed time per iteration (ms): 13620.7 | learning rate: 5.015E-06 | global batch size: 16 | lm loss: 7.470133E+00 | loss scale: 16384.0 | grad norm: 155289.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1131/ 159576 | consumed samples: 18096 | elapsed time per iteration (ms): 13683.0 | learning rate: 5.019E-06 | global batch size: 16 | lm loss: 7.539918E+00 | loss scale: 16384.0 | grad norm: 89135.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1132/ 159576 | consumed samples: 18112 | elapsed time per iteration (ms): 13643.2 | learning rate: 5.024E-06 | global batch size: 16 | lm loss: 7.537309E+00 | loss scale: 16384.0 | grad norm: 83460.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1133/ 159576 | consumed samples: 18128 | elapsed time per iteration (ms): 13758.8 | learning rate: 5.028E-06 | global batch size: 16 | lm loss: 7.445082E+00 | loss scale: 16384.0 | grad norm: 97599.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1134/ 159576 | consumed samples: 18144 | elapsed time per iteration (ms): 13842.3 | learning rate: 5.033E-06 | global batch size: 16 | lm loss: 7.533705E+00 | loss scale: 16384.0 | grad norm: 153106.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1135/ 159576 | consumed samples: 18160 | elapsed time per iteration (ms): 13641.3 | learning rate: 5.037E-06 | global batch size: 16 | lm loss: 7.351761E+00 | loss scale: 16384.0 | grad norm: 139552.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1136/ 159576 | consumed samples: 18176 | elapsed time per iteration (ms): 13757.6 | learning rate: 5.041E-06 | global batch size: 16 | lm loss: 7.386802E+00 | loss scale: 16384.0 | grad norm: 82271.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1137/ 159576 | consumed samples: 18192 | elapsed time per iteration (ms): 13590.7 | learning rate: 5.046E-06 | global batch size: 16 | lm loss: 7.276345E+00 | loss scale: 16384.0 | grad norm: 139306.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1138/ 159576 | consumed samples: 18208 | elapsed time per iteration (ms): 14099.6 | learning rate: 5.050E-06 | global batch size: 16 | lm loss: 7.489694E+00 | loss scale: 16384.0 | grad norm: 75568.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1139/ 159576 | consumed samples: 18224 | elapsed time per iteration (ms): 13765.0 | learning rate: 5.055E-06 | global batch size: 16 | lm loss: 6.968816E+00 | loss scale: 16384.0 | grad norm: 118020.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1140/ 159576 | consumed samples: 18240 | elapsed time per iteration (ms): 13662.4 | learning rate: 5.059E-06 | global batch size: 16 | lm loss: 7.446542E+00 | loss scale: 16384.0 | grad norm: 117497.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1141/ 159576 | consumed samples: 18256 | elapsed time per iteration (ms): 13747.0 | learning rate: 5.064E-06 | global batch size: 16 | lm loss: 7.328124E+00 | loss scale: 16384.0 | grad norm: 126653.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1142/ 159576 | consumed samples: 18272 | elapsed time per iteration (ms): 14086.2 | learning rate: 5.068E-06 | global batch size: 16 | lm loss: 7.359120E+00 | loss scale: 16384.0 | grad norm: 158587.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1143/ 159576 | consumed samples: 18288 | elapsed time per iteration (ms): 13785.6 | learning rate: 5.072E-06 | global batch size: 16 | lm loss: 7.289187E+00 | loss scale: 16384.0 | grad norm: 93193.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1144/ 159576 | consumed samples: 18304 | elapsed time per iteration (ms): 13650.1 | learning rate: 5.077E-06 | global batch size: 16 | lm loss: 7.541381E+00 | loss scale: 16384.0 | grad norm: 127276.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1145/ 159576 | consumed samples: 18320 | elapsed time per iteration (ms): 13673.3 | learning rate: 5.081E-06 | global batch size: 16 | lm loss: 7.343310E+00 | loss scale: 16384.0 | grad norm: 141086.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1146/ 159576 | consumed samples: 18336 | elapsed time per iteration (ms): 13709.3 | learning rate: 5.086E-06 | global batch size: 16 | lm loss: 7.291780E+00 | loss scale: 16384.0 | grad norm: 84706.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1147/ 159576 | consumed samples: 18352 | elapsed time per iteration (ms): 13798.7 | learning rate: 5.090E-06 | global batch size: 16 | lm loss: 7.395382E+00 | loss scale: 16384.0 | grad norm: 168181.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1148/ 159576 | consumed samples: 18368 | elapsed time per iteration (ms): 13678.3 | learning rate: 5.095E-06 | global batch size: 16 | lm loss: 7.287755E+00 | loss scale: 16384.0 | grad norm: 150595.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1149/ 159576 | consumed samples: 18384 | elapsed time per iteration (ms): 13705.6 | learning rate: 5.099E-06 | global batch size: 16 | lm loss: 7.521116E+00 | loss scale: 16384.0 | grad norm: 90594.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1150/ 159576 | consumed samples: 18400 | elapsed time per iteration (ms): 13724.2 | learning rate: 5.104E-06 | global batch size: 16 | lm loss: 7.560548E+00 | loss scale: 16384.0 | grad norm: 124093.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1151/ 159576 | consumed samples: 18416 | elapsed time per iteration (ms): 14011.4 | learning rate: 5.108E-06 | global batch size: 16 | lm loss: 7.334007E+00 | loss scale: 16384.0 | grad norm: 93590.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1152/ 159576 | consumed samples: 18432 | elapsed time per iteration (ms): 13638.1 | learning rate: 5.112E-06 | global batch size: 16 | lm loss: 7.340695E+00 | loss scale: 16384.0 | grad norm: 120515.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1153/ 159576 | consumed samples: 18448 | elapsed time per iteration (ms): 13670.9 | learning rate: 5.117E-06 | global batch size: 16 | lm loss: 7.310359E+00 | loss scale: 16384.0 | grad norm: 121580.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1154/ 159576 | consumed samples: 18464 | elapsed time per iteration (ms): 13692.4 | learning rate: 5.121E-06 | global batch size: 16 | lm loss: 7.407881E+00 | loss scale: 16384.0 | grad norm: 86210.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1155/ 159576 | consumed samples: 18480 | elapsed time per iteration (ms): 14124.7 | learning rate: 5.126E-06 | global batch size: 16 | lm loss: 7.533539E+00 | loss scale: 16384.0 | grad norm: 117499.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1156/ 159576 | consumed samples: 18496 | elapsed time per iteration (ms): 13713.9 | learning rate: 5.130E-06 | global batch size: 16 | lm loss: 7.454373E+00 | loss scale: 16384.0 | grad norm: 82164.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1157/ 159576 | consumed samples: 18512 | elapsed time per iteration (ms): 13665.0 | learning rate: 5.135E-06 | global batch size: 16 | lm loss: 6.997806E+00 | loss scale: 16384.0 | grad norm: 118291.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1158/ 159576 | consumed samples: 18528 | elapsed time per iteration (ms): 13620.7 | learning rate: 5.139E-06 | global batch size: 16 | lm loss: 7.155181E+00 | loss scale: 16384.0 | grad norm: 80841.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1159/ 159576 | consumed samples: 18544 | elapsed time per iteration (ms): 13522.0 | learning rate: 5.143E-06 | global batch size: 16 | lm loss: 7.303053E+00 | loss scale: 16384.0 | grad norm: 153692.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1160/ 159576 | consumed samples: 18560 | elapsed time per iteration (ms): 13934.6 | learning rate: 5.148E-06 | global batch size: 16 | lm loss: 7.453541E+00 | loss scale: 16384.0 | grad norm: 178564.006 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1161/ 159576 | consumed samples: 18576 | elapsed time per iteration (ms): 13591.1 | learning rate: 5.152E-06 | global batch size: 16 | lm loss: 7.370741E+00 | loss scale: 16384.0 | grad norm: 96828.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1162/ 159576 | consumed samples: 18592 | elapsed time per iteration (ms): 13610.9 | learning rate: 5.157E-06 | global batch size: 16 | lm loss: 7.395625E+00 | loss scale: 16384.0 | grad norm: 138531.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1163/ 159576 | consumed samples: 18608 | elapsed time per iteration (ms): 13633.4 | learning rate: 5.161E-06 | global batch size: 16 | lm loss: 7.721334E+00 | loss scale: 16384.0 | grad norm: 107198.076 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1164/ 159576 | consumed samples: 18624 | elapsed time per iteration (ms): 13919.7 | learning rate: 5.166E-06 | global batch size: 16 | lm loss: 7.418262E+00 | loss scale: 16384.0 | grad norm: 104593.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1165/ 159576 | consumed samples: 18640 | elapsed time per iteration (ms): 13699.8 | learning rate: 5.170E-06 | global batch size: 16 | lm loss: 7.388452E+00 | loss scale: 16384.0 | grad norm: 87922.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1166/ 159576 | consumed samples: 18656 | elapsed time per iteration (ms): 13567.0 | learning rate: 5.175E-06 | global batch size: 16 | lm loss: 7.359789E+00 | loss scale: 16384.0 | grad norm: 167490.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1167/ 159576 | consumed samples: 18672 | elapsed time per iteration (ms): 13665.3 | learning rate: 5.179E-06 | global batch size: 16 | lm loss: 7.513920E+00 | loss scale: 16384.0 | grad norm: 187148.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1168/ 159576 | consumed samples: 18688 | elapsed time per iteration (ms): 13712.9 | learning rate: 5.183E-06 | global batch size: 16 | lm loss: 7.333634E+00 | loss scale: 16384.0 | grad norm: 80524.927 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1169/ 159576 | consumed samples: 18704 | elapsed time per iteration (ms): 13807.4 | learning rate: 5.188E-06 | global batch size: 16 | lm loss: 7.551642E+00 | loss scale: 16384.0 | grad norm: 96715.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1170/ 159576 | consumed samples: 18720 | elapsed time per iteration (ms): 13672.0 | learning rate: 5.192E-06 | global batch size: 16 | lm loss: 7.354926E+00 | loss scale: 16384.0 | grad norm: 108931.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1171/ 159576 | consumed samples: 18736 | elapsed time per iteration (ms): 13735.2 | learning rate: 5.197E-06 | global batch size: 16 | lm loss: 7.360828E+00 | loss scale: 16384.0 | grad norm: 93043.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1172/ 159576 | consumed samples: 18752 | elapsed time per iteration (ms): 13717.8 | learning rate: 5.201E-06 | global batch size: 16 | lm loss: 7.538117E+00 | loss scale: 16384.0 | grad norm: 318365.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1173/ 159576 | consumed samples: 18768 | elapsed time per iteration (ms): 13883.3 | learning rate: 5.206E-06 | global batch size: 16 | lm loss: 7.601986E+00 | loss scale: 16384.0 | grad norm: 139775.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1174/ 159576 | consumed samples: 18784 | elapsed time per iteration (ms): 13707.5 | learning rate: 5.210E-06 | global batch size: 16 | lm loss: 7.492588E+00 | loss scale: 16384.0 | grad norm: 90689.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1175/ 159576 | consumed samples: 18800 | elapsed time per iteration (ms): 13678.7 | learning rate: 5.214E-06 | global batch size: 16 | lm loss: 7.586353E+00 | loss scale: 16384.0 | grad norm: 123587.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1176/ 159576 | consumed samples: 18816 | elapsed time per iteration (ms): 13643.8 | learning rate: 5.219E-06 | global batch size: 16 | lm loss: 7.585982E+00 | loss scale: 16384.0 | grad norm: 134121.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1177/ 159576 | consumed samples: 18832 | elapsed time per iteration (ms): 13876.6 | learning rate: 5.223E-06 | global batch size: 16 | lm loss: 7.290177E+00 | loss scale: 16384.0 | grad norm: 61795.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1178/ 159576 | consumed samples: 18848 | elapsed time per iteration (ms): 13887.6 | learning rate: 5.228E-06 | global batch size: 16 | lm loss: 7.394442E+00 | loss scale: 16384.0 | grad norm: 214580.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1179/ 159576 | consumed samples: 18864 | elapsed time per iteration (ms): 13671.2 | learning rate: 5.232E-06 | global batch size: 16 | lm loss: 7.342830E+00 | loss scale: 16384.0 | grad norm: 170377.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1180/ 159576 | consumed samples: 18880 | elapsed time per iteration (ms): 13615.6 | learning rate: 5.237E-06 | global batch size: 16 | lm loss: 7.353875E+00 | loss scale: 16384.0 | grad norm: 98364.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1181/ 159576 | consumed samples: 18896 | elapsed time per iteration (ms): 13659.2 | learning rate: 5.241E-06 | global batch size: 16 | lm loss: 7.310112E+00 | loss scale: 16384.0 | grad norm: 153347.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1182/ 159576 | consumed samples: 18912 | elapsed time per iteration (ms): 13718.2 | learning rate: 5.246E-06 | global batch size: 16 | lm loss: 7.516181E+00 | loss scale: 16384.0 | grad norm: 183425.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1183/ 159576 | consumed samples: 18928 | elapsed time per iteration (ms): 13614.7 | learning rate: 5.250E-06 | global batch size: 16 | lm loss: 7.284205E+00 | loss scale: 16384.0 | grad norm: 116539.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1184/ 159576 | consumed samples: 18944 | elapsed time per iteration (ms): 13636.1 | learning rate: 5.254E-06 | global batch size: 16 | lm loss: 7.392292E+00 | loss scale: 16384.0 | grad norm: 167498.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1185/ 159576 | consumed samples: 18960 | elapsed time per iteration (ms): 13633.9 | learning rate: 5.259E-06 | global batch size: 16 | lm loss: 7.250909E+00 | loss scale: 16384.0 | grad norm: 100955.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1186/ 159576 | consumed samples: 18976 | elapsed time per iteration (ms): 13999.4 | learning rate: 5.263E-06 | global batch size: 16 | lm loss: 7.536862E+00 | loss scale: 16384.0 | grad norm: 100050.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1187/ 159576 | consumed samples: 18992 | elapsed time per iteration (ms): 13653.6 | learning rate: 5.268E-06 | global batch size: 16 | lm loss: 7.565104E+00 | loss scale: 16384.0 | grad norm: 118619.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1188/ 159576 | consumed samples: 19008 | elapsed time per iteration (ms): 13606.5 | learning rate: 5.272E-06 | global batch size: 16 | lm loss: 7.258739E+00 | loss scale: 16384.0 | grad norm: 126790.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1189/ 159576 | consumed samples: 19024 | elapsed time per iteration (ms): 13571.9 | learning rate: 5.277E-06 | global batch size: 16 | lm loss: 7.184493E+00 | loss scale: 16384.0 | grad norm: 84818.036 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1190/ 159576 | consumed samples: 19040 | elapsed time per iteration (ms): 13962.3 | learning rate: 5.281E-06 | global batch size: 16 | lm loss: 7.209998E+00 | loss scale: 16384.0 | grad norm: 131280.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1191/ 159576 | consumed samples: 19056 | elapsed time per iteration (ms): 13770.8 | learning rate: 5.286E-06 | global batch size: 16 | lm loss: 7.406217E+00 | loss scale: 16384.0 | grad norm: 110178.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1192/ 159576 | consumed samples: 19072 | elapsed time per iteration (ms): 13665.3 | learning rate: 5.290E-06 | global batch size: 16 | lm loss: 7.350411E+00 | loss scale: 16384.0 | grad norm: 81228.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1193/ 159576 | consumed samples: 19088 | elapsed time per iteration (ms): 13585.9 | learning rate: 5.294E-06 | global batch size: 16 | lm loss: 7.583058E+00 | loss scale: 16384.0 | grad norm: 291080.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1194/ 159576 | consumed samples: 19104 | elapsed time per iteration (ms): 13658.0 | learning rate: 5.299E-06 | global batch size: 16 | lm loss: 7.808938E+00 | loss scale: 16384.0 | grad norm: 193632.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1195/ 159576 | consumed samples: 19120 | elapsed time per iteration (ms): 13777.0 | learning rate: 5.303E-06 | global batch size: 16 | lm loss: 7.459247E+00 | loss scale: 16384.0 | grad norm: 100738.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1196/ 159576 | consumed samples: 19136 | elapsed time per iteration (ms): 13624.3 | learning rate: 5.308E-06 | global batch size: 16 | lm loss: 7.240894E+00 | loss scale: 16384.0 | grad norm: 102223.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1197/ 159576 | consumed samples: 19152 | elapsed time per iteration (ms): 13630.2 | learning rate: 5.312E-06 | global batch size: 16 | lm loss: 7.469604E+00 | loss scale: 16384.0 | grad norm: 91547.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1198/ 159576 | consumed samples: 19168 | elapsed time per iteration (ms): 13603.4 | learning rate: 5.317E-06 | global batch size: 16 | lm loss: 7.399169E+00 | loss scale: 16384.0 | grad norm: 246196.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1199/ 159576 | consumed samples: 19184 | elapsed time per iteration (ms): 14028.5 | learning rate: 5.321E-06 | global batch size: 16 | lm loss: 7.465099E+00 | loss scale: 16384.0 | grad norm: 185665.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1200/ 159576 | consumed samples: 19200 | elapsed time per iteration (ms): 13601.1 | learning rate: 5.325E-06 | global batch size: 16 | lm loss: 7.383169E+00 | loss scale: 16384.0 | grad norm: 115872.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1201/ 159576 | consumed samples: 19216 | elapsed time per iteration (ms): 13566.6 | learning rate: 5.330E-06 | global batch size: 16 | lm loss: 7.352910E+00 | loss scale: 16384.0 | grad norm: 114834.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1202/ 159576 | consumed samples: 19232 | elapsed time per iteration (ms): 13557.4 | learning rate: 5.334E-06 | global batch size: 16 | lm loss: 7.521720E+00 | loss scale: 16384.0 | grad norm: 101976.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1203/ 159576 | consumed samples: 19248 | elapsed time per iteration (ms): 13525.0 | learning rate: 5.339E-06 | global batch size: 16 | lm loss: 7.225696E+00 | loss scale: 16384.0 | grad norm: 178745.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1204/ 159576 | consumed samples: 19264 | elapsed time per iteration (ms): 13539.3 | learning rate: 5.343E-06 | global batch size: 16 | lm loss: 7.375963E+00 | loss scale: 16384.0 | grad norm: 175723.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1205/ 159576 | consumed samples: 19280 | elapsed time per iteration (ms): 13532.3 | learning rate: 5.348E-06 | global batch size: 16 | lm loss: 7.402988E+00 | loss scale: 16384.0 | grad norm: 104645.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1206/ 159576 | consumed samples: 19296 | elapsed time per iteration (ms): 13502.9 | learning rate: 5.352E-06 | global batch size: 16 | lm loss: 7.302839E+00 | loss scale: 16384.0 | grad norm: 99328.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1207/ 159576 | consumed samples: 19312 | elapsed time per iteration (ms): 13540.4 | learning rate: 5.357E-06 | global batch size: 16 | lm loss: 7.555269E+00 | loss scale: 16384.0 | grad norm: 89166.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1208/ 159576 | consumed samples: 19328 | elapsed time per iteration (ms): 13900.0 | learning rate: 5.361E-06 | global batch size: 16 | lm loss: 7.459805E+00 | loss scale: 16384.0 | grad norm: 135152.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1209/ 159576 | consumed samples: 19344 | elapsed time per iteration (ms): 13560.6 | learning rate: 5.365E-06 | global batch size: 16 | lm loss: 7.419579E+00 | loss scale: 16384.0 | grad norm: 101249.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1210/ 159576 | consumed samples: 19360 | elapsed time per iteration (ms): 13658.8 | learning rate: 5.370E-06 | global batch size: 16 | lm loss: 7.348646E+00 | loss scale: 16384.0 | grad norm: 104483.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1211/ 159576 | consumed samples: 19376 | elapsed time per iteration (ms): 13533.6 | learning rate: 5.374E-06 | global batch size: 16 | lm loss: 7.494230E+00 | loss scale: 16384.0 | grad norm: 110210.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1212/ 159576 | consumed samples: 19392 | elapsed time per iteration (ms): 13905.0 | learning rate: 5.379E-06 | global batch size: 16 | lm loss: 7.390188E+00 | loss scale: 16384.0 | grad norm: 96645.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1213/ 159576 | consumed samples: 19408 | elapsed time per iteration (ms): 13673.2 | learning rate: 5.383E-06 | global batch size: 16 | lm loss: 7.318599E+00 | loss scale: 16384.0 | grad norm: 166216.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1214/ 159576 | consumed samples: 19424 | elapsed time per iteration (ms): 13582.9 | learning rate: 5.388E-06 | global batch size: 16 | lm loss: 7.262068E+00 | loss scale: 16384.0 | grad norm: 75724.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1215/ 159576 | consumed samples: 19440 | elapsed time per iteration (ms): 13570.1 | learning rate: 5.392E-06 | global batch size: 16 | lm loss: 7.594563E+00 | loss scale: 16384.0 | grad norm: 95306.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1216/ 159576 | consumed samples: 19456 | elapsed time per iteration (ms): 13639.7 | learning rate: 5.396E-06 | global batch size: 16 | lm loss: 7.375734E+00 | loss scale: 16384.0 | grad norm: 86152.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1217/ 159576 | consumed samples: 19472 | elapsed time per iteration (ms): 14091.6 | learning rate: 5.401E-06 | global batch size: 16 | lm loss: 7.213047E+00 | loss scale: 16384.0 | grad norm: 95583.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1218/ 159576 | consumed samples: 19488 | elapsed time per iteration (ms): 13516.3 | learning rate: 5.405E-06 | global batch size: 16 | lm loss: 7.437682E+00 | loss scale: 16384.0 | grad norm: 221549.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1219/ 159576 | consumed samples: 19504 | elapsed time per iteration (ms): 13610.0 | learning rate: 5.410E-06 | global batch size: 16 | lm loss: 7.254605E+00 | loss scale: 16384.0 | grad norm: 97554.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1220/ 159576 | consumed samples: 19520 | elapsed time per iteration (ms): 13565.5 | learning rate: 5.414E-06 | global batch size: 16 | lm loss: 7.248229E+00 | loss scale: 16384.0 | grad norm: 89138.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1221/ 159576 | consumed samples: 19536 | elapsed time per iteration (ms): 13989.3 | learning rate: 5.419E-06 | global batch size: 16 | lm loss: 7.313151E+00 | loss scale: 16384.0 | grad norm: 172651.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1222/ 159576 | consumed samples: 19552 | elapsed time per iteration (ms): 13602.4 | learning rate: 5.423E-06 | global batch size: 16 | lm loss: 7.476789E+00 | loss scale: 16384.0 | grad norm: 67387.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1223/ 159576 | consumed samples: 19568 | elapsed time per iteration (ms): 13656.0 | learning rate: 5.428E-06 | global batch size: 16 | lm loss: 7.289939E+00 | loss scale: 16384.0 | grad norm: 207125.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1224/ 159576 | consumed samples: 19584 | elapsed time per iteration (ms): 13537.8 | learning rate: 5.432E-06 | global batch size: 16 | lm loss: 7.409894E+00 | loss scale: 16384.0 | grad norm: 156218.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1225/ 159576 | consumed samples: 19600 | elapsed time per iteration (ms): 13600.0 | learning rate: 5.436E-06 | global batch size: 16 | lm loss: 7.226832E+00 | loss scale: 16384.0 | grad norm: 93258.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1226/ 159576 | consumed samples: 19616 | elapsed time per iteration (ms): 13778.7 | learning rate: 5.441E-06 | global batch size: 16 | lm loss: 7.406470E+00 | loss scale: 16384.0 | grad norm: 95037.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1227/ 159576 | consumed samples: 19632 | elapsed time per iteration (ms): 13609.5 | learning rate: 5.445E-06 | global batch size: 16 | lm loss: 7.385060E+00 | loss scale: 16384.0 | grad norm: 77831.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1228/ 159576 | consumed samples: 19648 | elapsed time per iteration (ms): 13561.8 | learning rate: 5.450E-06 | global batch size: 16 | lm loss: 7.283795E+00 | loss scale: 16384.0 | grad norm: 219813.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1229/ 159576 | consumed samples: 19664 | elapsed time per iteration (ms): 13619.4 | learning rate: 5.454E-06 | global batch size: 16 | lm loss: 7.344219E+00 | loss scale: 16384.0 | grad norm: 122192.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1230/ 159576 | consumed samples: 19680 | elapsed time per iteration (ms): 14054.6 | learning rate: 5.459E-06 | global batch size: 16 | lm loss: 7.364305E+00 | loss scale: 16384.0 | grad norm: 90944.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1231/ 159576 | consumed samples: 19696 | elapsed time per iteration (ms): 13589.9 | learning rate: 5.463E-06 | global batch size: 16 | lm loss: 7.421730E+00 | loss scale: 16384.0 | grad norm: 178816.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1232/ 159576 | consumed samples: 19712 | elapsed time per iteration (ms): 13624.6 | learning rate: 5.467E-06 | global batch size: 16 | lm loss: 7.278720E+00 | loss scale: 16384.0 | grad norm: 101190.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1233/ 159576 | consumed samples: 19728 | elapsed time per iteration (ms): 13574.7 | learning rate: 5.472E-06 | global batch size: 16 | lm loss: 7.525582E+00 | loss scale: 16384.0 | grad norm: 95476.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1234/ 159576 | consumed samples: 19744 | elapsed time per iteration (ms): 13981.0 | learning rate: 5.476E-06 | global batch size: 16 | lm loss: 7.294508E+00 | loss scale: 16384.0 | grad norm: 110379.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1235/ 159576 | consumed samples: 19760 | elapsed time per iteration (ms): 13641.1 | learning rate: 5.481E-06 | global batch size: 16 | lm loss: 7.431972E+00 | loss scale: 16384.0 | grad norm: 103188.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1236/ 159576 | consumed samples: 19776 | elapsed time per iteration (ms): 13575.4 | learning rate: 5.485E-06 | global batch size: 16 | lm loss: 7.397687E+00 | loss scale: 16384.0 | grad norm: 92125.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1237/ 159576 | consumed samples: 19792 | elapsed time per iteration (ms): 13672.0 | learning rate: 5.490E-06 | global batch size: 16 | lm loss: 7.314774E+00 | loss scale: 16384.0 | grad norm: 75870.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1238/ 159576 | consumed samples: 19808 | elapsed time per iteration (ms): 13509.4 | learning rate: 5.494E-06 | global batch size: 16 | lm loss: 7.187806E+00 | loss scale: 16384.0 | grad norm: 173296.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1239/ 159576 | consumed samples: 19824 | elapsed time per iteration (ms): 13875.3 | learning rate: 5.499E-06 | global batch size: 16 | lm loss: 7.376097E+00 | loss scale: 16384.0 | grad norm: 133632.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1240/ 159576 | consumed samples: 19840 | elapsed time per iteration (ms): 13610.1 | learning rate: 5.503E-06 | global batch size: 16 | lm loss: 7.267582E+00 | loss scale: 16384.0 | grad norm: 85104.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1241/ 159576 | consumed samples: 19856 | elapsed time per iteration (ms): 13551.5 | learning rate: 5.507E-06 | global batch size: 16 | lm loss: 7.352735E+00 | loss scale: 16384.0 | grad norm: 90699.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1242/ 159576 | consumed samples: 19872 | elapsed time per iteration (ms): 13593.9 | learning rate: 5.512E-06 | global batch size: 16 | lm loss: 7.468503E+00 | loss scale: 16384.0 | grad norm: 83188.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1243/ 159576 | consumed samples: 19888 | elapsed time per iteration (ms): 13930.9 | learning rate: 5.516E-06 | global batch size: 16 | lm loss: 7.214951E+00 | loss scale: 16384.0 | grad norm: 78366.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1244/ 159576 | consumed samples: 19904 | elapsed time per iteration (ms): 13652.1 | learning rate: 5.521E-06 | global batch size: 16 | lm loss: 7.260246E+00 | loss scale: 16384.0 | grad norm: 80928.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 07:03:47] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1162855_[2-10%1] on 'gpu_p13' partition) [2021-09-24 07:03:47] PULSE: tr8-104B is running for 1:11:36 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 1245/ 159576 | consumed samples: 19920 | elapsed time per iteration (ms): 13521.2 | learning rate: 5.525E-06 | global batch size: 16 | lm loss: 7.539850E+00 | loss scale: 16384.0 | grad norm: 85379.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1246/ 159576 | consumed samples: 19936 | elapsed time per iteration (ms): 13540.5 | learning rate: 5.530E-06 | global batch size: 16 | lm loss: 7.541747E+00 | loss scale: 16384.0 | grad norm: 112594.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1247/ 159576 | consumed samples: 19952 | elapsed time per iteration (ms): 13599.8 | learning rate: 5.534E-06 | global batch size: 16 | lm loss: 7.427727E+00 | loss scale: 16384.0 | grad norm: 75830.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1248/ 159576 | consumed samples: 19968 | elapsed time per iteration (ms): 13827.8 | learning rate: 5.538E-06 | global batch size: 16 | lm loss: 7.407825E+00 | loss scale: 16384.0 | grad norm: 125194.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1249/ 159576 | consumed samples: 19984 | elapsed time per iteration (ms): 13505.2 | learning rate: 5.543E-06 | global batch size: 16 | lm loss: 7.566711E+00 | loss scale: 16384.0 | grad norm: 116825.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1250/ 159576 | consumed samples: 20000 | elapsed time per iteration (ms): 13584.6 | learning rate: 5.547E-06 | global batch size: 16 | lm loss: 7.156756E+00 | loss scale: 16384.0 | grad norm: 75875.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1251/ 159576 | consumed samples: 20016 | elapsed time per iteration (ms): 13599.4 | learning rate: 5.552E-06 | global batch size: 16 | lm loss: 7.355666E+00 | loss scale: 16384.0 | grad norm: 128516.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1252/ 159576 | consumed samples: 20032 | elapsed time per iteration (ms): 13882.6 | learning rate: 5.556E-06 | global batch size: 16 | lm loss: 7.339529E+00 | loss scale: 16384.0 | grad norm: 92000.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1253/ 159576 | consumed samples: 20048 | elapsed time per iteration (ms): 13669.5 | learning rate: 5.561E-06 | global batch size: 16 | lm loss: 7.246970E+00 | loss scale: 16384.0 | grad norm: 68938.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1254/ 159576 | consumed samples: 20064 | elapsed time per iteration (ms): 13534.9 | learning rate: 5.565E-06 | global batch size: 16 | lm loss: 7.505607E+00 | loss scale: 16384.0 | grad norm: 103078.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1255/ 159576 | consumed samples: 20080 | elapsed time per iteration (ms): 13594.8 | learning rate: 5.570E-06 | global batch size: 16 | lm loss: 7.386476E+00 | loss scale: 16384.0 | grad norm: 104529.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1256/ 159576 | consumed samples: 20096 | elapsed time per iteration (ms): 13795.8 | learning rate: 5.574E-06 | global batch size: 16 | lm loss: 7.263406E+00 | loss scale: 16384.0 | grad norm: 82840.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1257/ 159576 | consumed samples: 20112 | elapsed time per iteration (ms): 13529.7 | learning rate: 5.578E-06 | global batch size: 16 | lm loss: 7.356731E+00 | loss scale: 16384.0 | grad norm: 64612.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1258/ 159576 | consumed samples: 20128 | elapsed time per iteration (ms): 13538.7 | learning rate: 5.583E-06 | global batch size: 16 | lm loss: 7.516888E+00 | loss scale: 16384.0 | grad norm: 136048.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1259/ 159576 | consumed samples: 20144 | elapsed time per iteration (ms): 13556.0 | learning rate: 5.587E-06 | global batch size: 16 | lm loss: 7.352553E+00 | loss scale: 16384.0 | grad norm: 81380.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1260/ 159576 | consumed samples: 20160 | elapsed time per iteration (ms): 13488.1 | learning rate: 5.592E-06 | global batch size: 16 | lm loss: 7.385587E+00 | loss scale: 16384.0 | grad norm: 121637.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1261/ 159576 | consumed samples: 20176 | elapsed time per iteration (ms): 13803.4 | learning rate: 5.596E-06 | global batch size: 16 | lm loss: 7.280743E+00 | loss scale: 16384.0 | grad norm: 89726.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1262/ 159576 | consumed samples: 20192 | elapsed time per iteration (ms): 13426.2 | learning rate: 5.601E-06 | global batch size: 16 | lm loss: 7.512013E+00 | loss scale: 16384.0 | grad norm: 85518.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1263/ 159576 | consumed samples: 20208 | elapsed time per iteration (ms): 13492.1 | learning rate: 5.605E-06 | global batch size: 16 | lm loss: 7.145048E+00 | loss scale: 16384.0 | grad norm: 112279.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1264/ 159576 | consumed samples: 20224 | elapsed time per iteration (ms): 13537.9 | learning rate: 5.609E-06 | global batch size: 16 | lm loss: 7.608912E+00 | loss scale: 16384.0 | grad norm: 96612.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1265/ 159576 | consumed samples: 20240 | elapsed time per iteration (ms): 13857.6 | learning rate: 5.614E-06 | global batch size: 16 | lm loss: 7.316525E+00 | loss scale: 16384.0 | grad norm: 73736.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1266/ 159576 | consumed samples: 20256 | elapsed time per iteration (ms): 13475.3 | learning rate: 5.618E-06 | global batch size: 16 | lm loss: 7.406303E+00 | loss scale: 16384.0 | grad norm: 69485.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1267/ 159576 | consumed samples: 20272 | elapsed time per iteration (ms): 13513.4 | learning rate: 5.623E-06 | global batch size: 16 | lm loss: 7.282144E+00 | loss scale: 16384.0 | grad norm: 72619.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1268/ 159576 | consumed samples: 20288 | elapsed time per iteration (ms): 13517.8 | learning rate: 5.627E-06 | global batch size: 16 | lm loss: 7.419368E+00 | loss scale: 16384.0 | grad norm: 107085.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1269/ 159576 | consumed samples: 20304 | elapsed time per iteration (ms): 13507.2 | learning rate: 5.632E-06 | global batch size: 16 | lm loss: 7.427319E+00 | loss scale: 16384.0 | grad norm: 75455.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1270/ 159576 | consumed samples: 20320 | elapsed time per iteration (ms): 13744.8 | learning rate: 5.636E-06 | global batch size: 16 | lm loss: 7.348005E+00 | loss scale: 16384.0 | grad norm: 119801.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1271/ 159576 | consumed samples: 20336 | elapsed time per iteration (ms): 13569.3 | learning rate: 5.641E-06 | global batch size: 16 | lm loss: 7.365005E+00 | loss scale: 16384.0 | grad norm: 64957.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1272/ 159576 | consumed samples: 20352 | elapsed time per iteration (ms): 13569.6 | learning rate: 5.645E-06 | global batch size: 16 | lm loss: 7.429317E+00 | loss scale: 16384.0 | grad norm: 178872.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1273/ 159576 | consumed samples: 20368 | elapsed time per iteration (ms): 13472.8 | learning rate: 5.649E-06 | global batch size: 16 | lm loss: 7.312444E+00 | loss scale: 16384.0 | grad norm: 131489.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1274/ 159576 | consumed samples: 20384 | elapsed time per iteration (ms): 14043.7 | learning rate: 5.654E-06 | global batch size: 16 | lm loss: 7.280907E+00 | loss scale: 16384.0 | grad norm: 80742.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1275/ 159576 | consumed samples: 20400 | elapsed time per iteration (ms): 13515.6 | learning rate: 5.658E-06 | global batch size: 16 | lm loss: 7.473969E+00 | loss scale: 16384.0 | grad norm: 192617.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1276/ 159576 | consumed samples: 20416 | elapsed time per iteration (ms): 13555.1 | learning rate: 5.663E-06 | global batch size: 16 | lm loss: 7.571683E+00 | loss scale: 16384.0 | grad norm: 142231.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1277/ 159576 | consumed samples: 20432 | elapsed time per iteration (ms): 13684.0 | learning rate: 5.667E-06 | global batch size: 16 | lm loss: 7.370350E+00 | loss scale: 16384.0 | grad norm: 91290.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1278/ 159576 | consumed samples: 20448 | elapsed time per iteration (ms): 14108.9 | learning rate: 5.672E-06 | global batch size: 16 | lm loss: 7.258504E+00 | loss scale: 16384.0 | grad norm: 111985.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1279/ 159576 | consumed samples: 20464 | elapsed time per iteration (ms): 13599.8 | learning rate: 5.676E-06 | global batch size: 16 | lm loss: 7.378584E+00 | loss scale: 16384.0 | grad norm: 101238.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1280/ 159576 | consumed samples: 20480 | elapsed time per iteration (ms): 13689.3 | learning rate: 5.680E-06 | global batch size: 16 | lm loss: 7.344358E+00 | loss scale: 16384.0 | grad norm: 131175.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1281/ 159576 | consumed samples: 20496 | elapsed time per iteration (ms): 13675.0 | learning rate: 5.685E-06 | global batch size: 16 | lm loss: 7.253249E+00 | loss scale: 16384.0 | grad norm: 81245.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1282/ 159576 | consumed samples: 20512 | elapsed time per iteration (ms): 13723.8 | learning rate: 5.689E-06 | global batch size: 16 | lm loss: 7.385771E+00 | loss scale: 16384.0 | grad norm: 80281.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1283/ 159576 | consumed samples: 20528 | elapsed time per iteration (ms): 13839.8 | learning rate: 5.694E-06 | global batch size: 16 | lm loss: 7.253633E+00 | loss scale: 16384.0 | grad norm: 106168.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1284/ 159576 | consumed samples: 20544 | elapsed time per iteration (ms): 13645.0 | learning rate: 5.698E-06 | global batch size: 16 | lm loss: 7.091393E+00 | loss scale: 16384.0 | grad norm: 119249.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1285/ 159576 | consumed samples: 20560 | elapsed time per iteration (ms): 13673.3 | learning rate: 5.703E-06 | global batch size: 16 | lm loss: 7.346157E+00 | loss scale: 16384.0 | grad norm: 87118.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1286/ 159576 | consumed samples: 20576 | elapsed time per iteration (ms): 13680.7 | learning rate: 5.707E-06 | global batch size: 16 | lm loss: 7.301017E+00 | loss scale: 16384.0 | grad norm: 66813.094 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1287/ 159576 | consumed samples: 20592 | elapsed time per iteration (ms): 14107.0 | learning rate: 5.712E-06 | global batch size: 16 | lm loss: 7.228415E+00 | loss scale: 16384.0 | grad norm: 90274.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1288/ 159576 | consumed samples: 20608 | elapsed time per iteration (ms): 13593.6 | learning rate: 5.716E-06 | global batch size: 16 | lm loss: 7.412420E+00 | loss scale: 16384.0 | grad norm: 74854.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1289/ 159576 | consumed samples: 20624 | elapsed time per iteration (ms): 13657.4 | learning rate: 5.720E-06 | global batch size: 16 | lm loss: 7.296477E+00 | loss scale: 16384.0 | grad norm: 78756.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1290/ 159576 | consumed samples: 20640 | elapsed time per iteration (ms): 13628.7 | learning rate: 5.725E-06 | global batch size: 16 | lm loss: 7.091270E+00 | loss scale: 16384.0 | grad norm: 77550.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1291/ 159576 | consumed samples: 20656 | elapsed time per iteration (ms): 13654.9 | learning rate: 5.729E-06 | global batch size: 16 | lm loss: 7.247941E+00 | loss scale: 16384.0 | grad norm: 140565.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1292/ 159576 | consumed samples: 20672 | elapsed time per iteration (ms): 13789.5 | learning rate: 5.734E-06 | global batch size: 16 | lm loss: 7.326149E+00 | loss scale: 16384.0 | grad norm: 66170.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1293/ 159576 | consumed samples: 20688 | elapsed time per iteration (ms): 13629.3 | learning rate: 5.738E-06 | global batch size: 16 | lm loss: 7.358797E+00 | loss scale: 16384.0 | grad norm: 94692.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1294/ 159576 | consumed samples: 20704 | elapsed time per iteration (ms): 13584.0 | learning rate: 5.743E-06 | global batch size: 16 | lm loss: 7.254357E+00 | loss scale: 16384.0 | grad norm: 69169.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1295/ 159576 | consumed samples: 20720 | elapsed time per iteration (ms): 13612.6 | learning rate: 5.747E-06 | global batch size: 16 | lm loss: 7.449785E+00 | loss scale: 16384.0 | grad norm: 180039.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1296/ 159576 | consumed samples: 20736 | elapsed time per iteration (ms): 13948.4 | learning rate: 5.751E-06 | global batch size: 16 | lm loss: 7.506041E+00 | loss scale: 16384.0 | grad norm: 147606.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1297/ 159576 | consumed samples: 20752 | elapsed time per iteration (ms): 13604.2 | learning rate: 5.756E-06 | global batch size: 16 | lm loss: 7.265352E+00 | loss scale: 16384.0 | grad norm: 87511.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1298/ 159576 | consumed samples: 20768 | elapsed time per iteration (ms): 13622.0 | learning rate: 5.760E-06 | global batch size: 16 | lm loss: 7.446327E+00 | loss scale: 16384.0 | grad norm: 91155.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1299/ 159576 | consumed samples: 20784 | elapsed time per iteration (ms): 13674.5 | learning rate: 5.765E-06 | global batch size: 16 | lm loss: 7.469901E+00 | loss scale: 16384.0 | grad norm: 219048.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1300/ 159576 | consumed samples: 20800 | elapsed time per iteration (ms): 13848.4 | learning rate: 5.769E-06 | global batch size: 16 | lm loss: 7.389014E+00 | loss scale: 16384.0 | grad norm: 84402.094 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1301/ 159576 | consumed samples: 20816 | elapsed time per iteration (ms): 13625.0 | learning rate: 5.774E-06 | global batch size: 16 | lm loss: 7.303530E+00 | loss scale: 16384.0 | grad norm: 174901.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1302/ 159576 | consumed samples: 20832 | elapsed time per iteration (ms): 13624.5 | learning rate: 5.778E-06 | global batch size: 16 | lm loss: 7.358258E+00 | loss scale: 16384.0 | grad norm: 146018.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1303/ 159576 | consumed samples: 20848 | elapsed time per iteration (ms): 13602.8 | learning rate: 5.783E-06 | global batch size: 16 | lm loss: 7.337800E+00 | loss scale: 16384.0 | grad norm: 109327.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1304/ 159576 | consumed samples: 20864 | elapsed time per iteration (ms): 13628.1 | learning rate: 5.787E-06 | global batch size: 16 | lm loss: 7.310088E+00 | loss scale: 16384.0 | grad norm: 83547.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1305/ 159576 | consumed samples: 20880 | elapsed time per iteration (ms): 13754.8 | learning rate: 5.791E-06 | global batch size: 16 | lm loss: 7.464965E+00 | loss scale: 16384.0 | grad norm: 695515.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1306/ 159576 | consumed samples: 20896 | elapsed time per iteration (ms): 13652.7 | learning rate: 5.796E-06 | global batch size: 16 | lm loss: 7.764376E+00 | loss scale: 16384.0 | grad norm: 569876.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1307/ 159576 | consumed samples: 20912 | elapsed time per iteration (ms): 13609.0 | learning rate: 5.800E-06 | global batch size: 16 | lm loss: 7.550226E+00 | loss scale: 16384.0 | grad norm: 356748.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1308/ 159576 | consumed samples: 20928 | elapsed time per iteration (ms): 13602.6 | learning rate: 5.805E-06 | global batch size: 16 | lm loss: 7.402792E+00 | loss scale: 16384.0 | grad norm: 159267.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1309/ 159576 | consumed samples: 20944 | elapsed time per iteration (ms): 13968.8 | learning rate: 5.809E-06 | global batch size: 16 | lm loss: 7.204682E+00 | loss scale: 16384.0 | grad norm: 129995.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1310/ 159576 | consumed samples: 20960 | elapsed time per iteration (ms): 13646.5 | learning rate: 5.814E-06 | global batch size: 16 | lm loss: 7.591084E+00 | loss scale: 16384.0 | grad norm: 143380.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1311/ 159576 | consumed samples: 20976 | elapsed time per iteration (ms): 13595.1 | learning rate: 5.818E-06 | global batch size: 16 | lm loss: 7.316426E+00 | loss scale: 16384.0 | grad norm: 150593.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1312/ 159576 | consumed samples: 20992 | elapsed time per iteration (ms): 13595.5 | learning rate: 5.822E-06 | global batch size: 16 | lm loss: 7.305964E+00 | loss scale: 16384.0 | grad norm: 177049.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1313/ 159576 | consumed samples: 21008 | elapsed time per iteration (ms): 13979.9 | learning rate: 5.827E-06 | global batch size: 16 | lm loss: 7.567747E+00 | loss scale: 16384.0 | grad norm: 169809.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1314/ 159576 | consumed samples: 21024 | elapsed time per iteration (ms): 13640.7 | learning rate: 5.831E-06 | global batch size: 16 | lm loss: 7.395080E+00 | loss scale: 16384.0 | grad norm: 145564.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1315/ 159576 | consumed samples: 21040 | elapsed time per iteration (ms): 13592.0 | learning rate: 5.836E-06 | global batch size: 16 | lm loss: 7.317047E+00 | loss scale: 16384.0 | grad norm: 104694.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1316/ 159576 | consumed samples: 21056 | elapsed time per iteration (ms): 13586.9 | learning rate: 5.840E-06 | global batch size: 16 | lm loss: 7.255484E+00 | loss scale: 16384.0 | grad norm: 93976.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1317/ 159576 | consumed samples: 21072 | elapsed time per iteration (ms): 13589.9 | learning rate: 5.845E-06 | global batch size: 16 | lm loss: 7.440733E+00 | loss scale: 16384.0 | grad norm: 181969.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1318/ 159576 | consumed samples: 21088 | elapsed time per iteration (ms): 13777.5 | learning rate: 5.849E-06 | global batch size: 16 | lm loss: 7.425194E+00 | loss scale: 16384.0 | grad norm: 109784.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1319/ 159576 | consumed samples: 21104 | elapsed time per iteration (ms): 13622.9 | learning rate: 5.854E-06 | global batch size: 16 | lm loss: 7.338997E+00 | loss scale: 16384.0 | grad norm: 146618.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1320/ 159576 | consumed samples: 21120 | elapsed time per iteration (ms): 13655.9 | learning rate: 5.858E-06 | global batch size: 16 | lm loss: 7.517268E+00 | loss scale: 16384.0 | grad norm: 108508.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1321/ 159576 | consumed samples: 21136 | elapsed time per iteration (ms): 13535.6 | learning rate: 5.862E-06 | global batch size: 16 | lm loss: 7.358712E+00 | loss scale: 16384.0 | grad norm: 100699.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1322/ 159576 | consumed samples: 21152 | elapsed time per iteration (ms): 13935.1 | learning rate: 5.867E-06 | global batch size: 16 | lm loss: 7.184452E+00 | loss scale: 16384.0 | grad norm: 85896.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1323/ 159576 | consumed samples: 21168 | elapsed time per iteration (ms): 13612.2 | learning rate: 5.871E-06 | global batch size: 16 | lm loss: 7.388680E+00 | loss scale: 16384.0 | grad norm: 283765.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1324/ 159576 | consumed samples: 21184 | elapsed time per iteration (ms): 13600.2 | learning rate: 5.876E-06 | global batch size: 16 | lm loss: 7.594103E+00 | loss scale: 16384.0 | grad norm: 191758.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1325/ 159576 | consumed samples: 21200 | elapsed time per iteration (ms): 13592.0 | learning rate: 5.880E-06 | global batch size: 16 | lm loss: 7.443296E+00 | loss scale: 16384.0 | grad norm: 112255.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1326/ 159576 | consumed samples: 21216 | elapsed time per iteration (ms): 13594.2 | learning rate: 5.885E-06 | global batch size: 16 | lm loss: 7.192332E+00 | loss scale: 16384.0 | grad norm: 110320.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1327/ 159576 | consumed samples: 21232 | elapsed time per iteration (ms): 13762.8 | learning rate: 5.889E-06 | global batch size: 16 | lm loss: 8.096416E+00 | loss scale: 16384.0 | grad norm: 131448.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1328/ 159576 | consumed samples: 21248 | elapsed time per iteration (ms): 13579.8 | learning rate: 5.893E-06 | global batch size: 16 | lm loss: 7.433802E+00 | loss scale: 16384.0 | grad norm: 182837.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1329/ 159576 | consumed samples: 21264 | elapsed time per iteration (ms): 13581.7 | learning rate: 5.898E-06 | global batch size: 16 | lm loss: 7.172110E+00 | loss scale: 16384.0 | grad norm: 100348.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1330/ 159576 | consumed samples: 21280 | elapsed time per iteration (ms): 13583.6 | learning rate: 5.902E-06 | global batch size: 16 | lm loss: 7.240623E+00 | loss scale: 16384.0 | grad norm: 100150.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1331/ 159576 | consumed samples: 21296 | elapsed time per iteration (ms): 14102.4 | learning rate: 5.907E-06 | global batch size: 16 | lm loss: 7.203824E+00 | loss scale: 16384.0 | grad norm: 241560.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1332/ 159576 | consumed samples: 21312 | elapsed time per iteration (ms): 13644.3 | learning rate: 5.911E-06 | global batch size: 16 | lm loss: 7.245723E+00 | loss scale: 16384.0 | grad norm: 129411.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1333/ 159576 | consumed samples: 21328 | elapsed time per iteration (ms): 13656.6 | learning rate: 5.916E-06 | global batch size: 16 | lm loss: 7.574631E+00 | loss scale: 16384.0 | grad norm: 172987.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1334/ 159576 | consumed samples: 21344 | elapsed time per iteration (ms): 13588.8 | learning rate: 5.920E-06 | global batch size: 16 | lm loss: 7.287757E+00 | loss scale: 16384.0 | grad norm: 99651.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1335/ 159576 | consumed samples: 21360 | elapsed time per iteration (ms): 14011.8 | learning rate: 5.925E-06 | global batch size: 16 | lm loss: 7.268057E+00 | loss scale: 16384.0 | grad norm: 109280.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1336/ 159576 | consumed samples: 21376 | elapsed time per iteration (ms): 13624.4 | learning rate: 5.929E-06 | global batch size: 16 | lm loss: 7.062439E+00 | loss scale: 16384.0 | grad norm: 160438.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1337/ 159576 | consumed samples: 21392 | elapsed time per iteration (ms): 13544.1 | learning rate: 5.933E-06 | global batch size: 16 | lm loss: 7.233086E+00 | loss scale: 16384.0 | grad norm: 175313.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1338/ 159576 | consumed samples: 21408 | elapsed time per iteration (ms): 13619.6 | learning rate: 5.938E-06 | global batch size: 16 | lm loss: 7.333053E+00 | loss scale: 16384.0 | grad norm: 104091.148 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1339/ 159576 | consumed samples: 21424 | elapsed time per iteration (ms): 13622.4 | learning rate: 5.942E-06 | global batch size: 16 | lm loss: 7.263519E+00 | loss scale: 16384.0 | grad norm: 90175.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1340/ 159576 | consumed samples: 21440 | elapsed time per iteration (ms): 13736.6 | learning rate: 5.947E-06 | global batch size: 16 | lm loss: 7.445864E+00 | loss scale: 16384.0 | grad norm: 136689.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1341/ 159576 | consumed samples: 21456 | elapsed time per iteration (ms): 13686.3 | learning rate: 5.951E-06 | global batch size: 16 | lm loss: 7.362231E+00 | loss scale: 16384.0 | grad norm: 184602.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1342/ 159576 | consumed samples: 21472 | elapsed time per iteration (ms): 13488.8 | learning rate: 5.956E-06 | global batch size: 16 | lm loss: 7.368071E+00 | loss scale: 16384.0 | grad norm: 82633.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1343/ 159576 | consumed samples: 21488 | elapsed time per iteration (ms): 13605.8 | learning rate: 5.960E-06 | global batch size: 16 | lm loss: 7.327272E+00 | loss scale: 16384.0 | grad norm: 92741.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1344/ 159576 | consumed samples: 21504 | elapsed time per iteration (ms): 14069.0 | learning rate: 5.964E-06 | global batch size: 16 | lm loss: 7.323634E+00 | loss scale: 16384.0 | grad norm: 99780.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1345/ 159576 | consumed samples: 21520 | elapsed time per iteration (ms): 13450.7 | learning rate: 5.969E-06 | global batch size: 16 | lm loss: 7.741362E+00 | loss scale: 16384.0 | grad norm: 105396.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1346/ 159576 | consumed samples: 21536 | elapsed time per iteration (ms): 13598.3 | learning rate: 5.973E-06 | global batch size: 16 | lm loss: 7.280247E+00 | loss scale: 16384.0 | grad norm: 77724.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1347/ 159576 | consumed samples: 21552 | elapsed time per iteration (ms): 13585.6 | learning rate: 5.978E-06 | global batch size: 16 | lm loss: 7.398378E+00 | loss scale: 16384.0 | grad norm: 69954.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1348/ 159576 | consumed samples: 21568 | elapsed time per iteration (ms): 13610.3 | learning rate: 5.982E-06 | global batch size: 16 | lm loss: 7.321609E+00 | loss scale: 16384.0 | grad norm: 94086.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1349/ 159576 | consumed samples: 21584 | elapsed time per iteration (ms): 13777.1 | learning rate: 5.987E-06 | global batch size: 16 | lm loss: 7.188628E+00 | loss scale: 16384.0 | grad norm: 81475.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1350/ 159576 | consumed samples: 21600 | elapsed time per iteration (ms): 13566.9 | learning rate: 5.991E-06 | global batch size: 16 | lm loss: 7.515175E+00 | loss scale: 16384.0 | grad norm: 78780.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1351/ 159576 | consumed samples: 21616 | elapsed time per iteration (ms): 13622.9 | learning rate: 5.996E-06 | global batch size: 16 | lm loss: 7.231083E+00 | loss scale: 16384.0 | grad norm: 86153.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1352/ 159576 | consumed samples: 21632 | elapsed time per iteration (ms): 13562.3 | learning rate: 6.000E-06 | global batch size: 16 | lm loss: 7.206710E+00 | loss scale: 16384.0 | grad norm: 83949.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1353/ 159576 | consumed samples: 21648 | elapsed time per iteration (ms): 13968.8 | learning rate: 6.004E-06 | global batch size: 16 | lm loss: 7.293135E+00 | loss scale: 16384.0 | grad norm: 83956.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1354/ 159576 | consumed samples: 21664 | elapsed time per iteration (ms): 13680.7 | learning rate: 6.009E-06 | global batch size: 16 | lm loss: 7.282973E+00 | loss scale: 16384.0 | grad norm: 102770.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1355/ 159576 | consumed samples: 21680 | elapsed time per iteration (ms): 13601.4 | learning rate: 6.013E-06 | global batch size: 16 | lm loss: 7.427012E+00 | loss scale: 16384.0 | grad norm: 87455.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1356/ 159576 | consumed samples: 21696 | elapsed time per iteration (ms): 13542.1 | learning rate: 6.018E-06 | global batch size: 16 | lm loss: 7.529208E+00 | loss scale: 16384.0 | grad norm: 83130.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1357/ 159576 | consumed samples: 21712 | elapsed time per iteration (ms): 13961.0 | learning rate: 6.022E-06 | global batch size: 16 | lm loss: 7.327049E+00 | loss scale: 16384.0 | grad norm: 77841.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1358/ 159576 | consumed samples: 21728 | elapsed time per iteration (ms): 13587.5 | learning rate: 6.027E-06 | global batch size: 16 | lm loss: 7.267120E+00 | loss scale: 16384.0 | grad norm: 86295.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1359/ 159576 | consumed samples: 21744 | elapsed time per iteration (ms): 13505.9 | learning rate: 6.031E-06 | global batch size: 16 | lm loss: 7.190462E+00 | loss scale: 16384.0 | grad norm: 154865.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1360/ 159576 | consumed samples: 21760 | elapsed time per iteration (ms): 13616.0 | learning rate: 6.036E-06 | global batch size: 16 | lm loss: 7.321602E+00 | loss scale: 16384.0 | grad norm: 112461.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1361/ 159576 | consumed samples: 21776 | elapsed time per iteration (ms): 13547.3 | learning rate: 6.040E-06 | global batch size: 16 | lm loss: 7.145373E+00 | loss scale: 16384.0 | grad norm: 72055.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1362/ 159576 | consumed samples: 21792 | elapsed time per iteration (ms): 13692.3 | learning rate: 6.044E-06 | global batch size: 16 | lm loss: 7.077173E+00 | loss scale: 16384.0 | grad norm: 103896.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1363/ 159576 | consumed samples: 21808 | elapsed time per iteration (ms): 13612.5 | learning rate: 6.049E-06 | global batch size: 16 | lm loss: 7.245114E+00 | loss scale: 16384.0 | grad norm: 79354.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1364/ 159576 | consumed samples: 21824 | elapsed time per iteration (ms): 13541.3 | learning rate: 6.053E-06 | global batch size: 16 | lm loss: 7.281060E+00 | loss scale: 16384.0 | grad norm: 148274.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1365/ 159576 | consumed samples: 21840 | elapsed time per iteration (ms): 13609.2 | learning rate: 6.058E-06 | global batch size: 16 | lm loss: 7.401906E+00 | loss scale: 16384.0 | grad norm: 119123.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1366/ 159576 | consumed samples: 21856 | elapsed time per iteration (ms): 13916.7 | learning rate: 6.062E-06 | global batch size: 16 | lm loss: 7.338102E+00 | loss scale: 16384.0 | grad norm: 93708.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1367/ 159576 | consumed samples: 21872 | elapsed time per iteration (ms): 13536.5 | learning rate: 6.067E-06 | global batch size: 16 | lm loss: 7.494397E+00 | loss scale: 16384.0 | grad norm: 130779.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1368/ 159576 | consumed samples: 21888 | elapsed time per iteration (ms): 13577.1 | learning rate: 6.071E-06 | global batch size: 16 | lm loss: 7.007359E+00 | loss scale: 16384.0 | grad norm: 94271.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1369/ 159576 | consumed samples: 21904 | elapsed time per iteration (ms): 13571.4 | learning rate: 6.075E-06 | global batch size: 16 | lm loss: 7.129241E+00 | loss scale: 16384.0 | grad norm: 129962.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1370/ 159576 | consumed samples: 21920 | elapsed time per iteration (ms): 13603.2 | learning rate: 6.080E-06 | global batch size: 16 | lm loss: 7.323318E+00 | loss scale: 16384.0 | grad norm: 138541.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1371/ 159576 | consumed samples: 21936 | elapsed time per iteration (ms): 13998.6 | learning rate: 6.084E-06 | global batch size: 16 | lm loss: 7.164912E+00 | loss scale: 16384.0 | grad norm: 95366.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1372/ 159576 | consumed samples: 21952 | elapsed time per iteration (ms): 13587.8 | learning rate: 6.089E-06 | global batch size: 16 | lm loss: 7.207436E+00 | loss scale: 16384.0 | grad norm: 95481.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1373/ 159576 | consumed samples: 21968 | elapsed time per iteration (ms): 13570.1 | learning rate: 6.093E-06 | global batch size: 16 | lm loss: 7.245305E+00 | loss scale: 16384.0 | grad norm: 110814.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1374/ 159576 | consumed samples: 21984 | elapsed time per iteration (ms): 13553.5 | learning rate: 6.098E-06 | global batch size: 16 | lm loss: 7.184179E+00 | loss scale: 16384.0 | grad norm: 92107.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1375/ 159576 | consumed samples: 22000 | elapsed time per iteration (ms): 13994.4 | learning rate: 6.102E-06 | global batch size: 16 | lm loss: 7.117487E+00 | loss scale: 16384.0 | grad norm: 77237.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1376/ 159576 | consumed samples: 22016 | elapsed time per iteration (ms): 13625.6 | learning rate: 6.107E-06 | global batch size: 16 | lm loss: 7.445632E+00 | loss scale: 16384.0 | grad norm: 139111.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1377/ 159576 | consumed samples: 22032 | elapsed time per iteration (ms): 13559.3 | learning rate: 6.111E-06 | global batch size: 16 | lm loss: 7.513434E+00 | loss scale: 16384.0 | grad norm: 111307.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1378/ 159576 | consumed samples: 22048 | elapsed time per iteration (ms): 13608.4 | learning rate: 6.115E-06 | global batch size: 16 | lm loss: 7.255265E+00 | loss scale: 16384.0 | grad norm: 88273.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1379/ 159576 | consumed samples: 22064 | elapsed time per iteration (ms): 14048.5 | learning rate: 6.120E-06 | global batch size: 16 | lm loss: 7.123577E+00 | loss scale: 16384.0 | grad norm: 85346.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1380/ 159576 | consumed samples: 22080 | elapsed time per iteration (ms): 13485.1 | learning rate: 6.124E-06 | global batch size: 16 | lm loss: 7.134797E+00 | loss scale: 16384.0 | grad norm: 118284.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1381/ 159576 | consumed samples: 22096 | elapsed time per iteration (ms): 13616.6 | learning rate: 6.129E-06 | global batch size: 16 | lm loss: 7.281054E+00 | loss scale: 16384.0 | grad norm: 88229.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1382/ 159576 | consumed samples: 22112 | elapsed time per iteration (ms): 13576.6 | learning rate: 6.133E-06 | global batch size: 16 | lm loss: 7.397271E+00 | loss scale: 16384.0 | grad norm: 130821.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1383/ 159576 | consumed samples: 22128 | elapsed time per iteration (ms): 13587.8 | learning rate: 6.138E-06 | global batch size: 16 | lm loss: 7.362026E+00 | loss scale: 16384.0 | grad norm: 83450.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1384/ 159576 | consumed samples: 22144 | elapsed time per iteration (ms): 13848.8 | learning rate: 6.142E-06 | global batch size: 16 | lm loss: 7.275143E+00 | loss scale: 16384.0 | grad norm: 86287.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1385/ 159576 | consumed samples: 22160 | elapsed time per iteration (ms): 13576.9 | learning rate: 6.146E-06 | global batch size: 16 | lm loss: 7.400926E+00 | loss scale: 16384.0 | grad norm: 98321.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1386/ 159576 | consumed samples: 22176 | elapsed time per iteration (ms): 13627.2 | learning rate: 6.151E-06 | global batch size: 16 | lm loss: 7.151899E+00 | loss scale: 16384.0 | grad norm: 85060.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1387/ 159576 | consumed samples: 22192 | elapsed time per iteration (ms): 13519.4 | learning rate: 6.155E-06 | global batch size: 16 | lm loss: 7.335835E+00 | loss scale: 16384.0 | grad norm: 64450.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1388/ 159576 | consumed samples: 22208 | elapsed time per iteration (ms): 13906.1 | learning rate: 6.160E-06 | global batch size: 16 | lm loss: 7.316273E+00 | loss scale: 16384.0 | grad norm: 66517.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1389/ 159576 | consumed samples: 22224 | elapsed time per iteration (ms): 13589.2 | learning rate: 6.164E-06 | global batch size: 16 | lm loss: 7.190707E+00 | loss scale: 16384.0 | grad norm: 123710.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1390/ 159576 | consumed samples: 22240 | elapsed time per iteration (ms): 13545.5 | learning rate: 6.169E-06 | global batch size: 16 | lm loss: 7.337936E+00 | loss scale: 16384.0 | grad norm: 78178.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1391/ 159576 | consumed samples: 22256 | elapsed time per iteration (ms): 13564.6 | learning rate: 6.173E-06 | global batch size: 16 | lm loss: 7.539785E+00 | loss scale: 16384.0 | grad norm: 111563.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1392/ 159576 | consumed samples: 22272 | elapsed time per iteration (ms): 13891.4 | learning rate: 6.178E-06 | global batch size: 16 | lm loss: 7.071362E+00 | loss scale: 16384.0 | grad norm: 70647.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1393/ 159576 | consumed samples: 22288 | elapsed time per iteration (ms): 13681.2 | learning rate: 6.182E-06 | global batch size: 16 | lm loss: 7.133610E+00 | loss scale: 16384.0 | grad norm: 124103.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1394/ 159576 | consumed samples: 22304 | elapsed time per iteration (ms): 13531.0 | learning rate: 6.186E-06 | global batch size: 16 | lm loss: 7.323411E+00 | loss scale: 16384.0 | grad norm: 99951.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1395/ 159576 | consumed samples: 22320 | elapsed time per iteration (ms): 13568.0 | learning rate: 6.191E-06 | global batch size: 16 | lm loss: 7.184701E+00 | loss scale: 16384.0 | grad norm: 71905.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1396/ 159576 | consumed samples: 22336 | elapsed time per iteration (ms): 13541.4 | learning rate: 6.195E-06 | global batch size: 16 | lm loss: 7.166233E+00 | loss scale: 16384.0 | grad norm: 81874.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1397/ 159576 | consumed samples: 22352 | elapsed time per iteration (ms): 13897.4 | learning rate: 6.200E-06 | global batch size: 16 | lm loss: 7.247505E+00 | loss scale: 16384.0 | grad norm: 84059.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1398/ 159576 | consumed samples: 22368 | elapsed time per iteration (ms): 13621.5 | learning rate: 6.204E-06 | global batch size: 16 | lm loss: 7.240150E+00 | loss scale: 16384.0 | grad norm: 119489.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1399/ 159576 | consumed samples: 22384 | elapsed time per iteration (ms): 13579.9 | learning rate: 6.209E-06 | global batch size: 16 | lm loss: 7.294222E+00 | loss scale: 16384.0 | grad norm: 80417.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1400/ 159576 | consumed samples: 22400 | elapsed time per iteration (ms): 13625.0 | learning rate: 6.213E-06 | global batch size: 16 | lm loss: 7.203695E+00 | loss scale: 16384.0 | grad norm: 97654.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1401/ 159576 | consumed samples: 22416 | elapsed time per iteration (ms): 14002.5 | learning rate: 6.217E-06 | global batch size: 16 | lm loss: 7.173908E+00 | loss scale: 16384.0 | grad norm: 72597.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1402/ 159576 | consumed samples: 22432 | elapsed time per iteration (ms): 13559.2 | learning rate: 6.222E-06 | global batch size: 16 | lm loss: 7.213487E+00 | loss scale: 16384.0 | grad norm: 108337.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1403/ 159576 | consumed samples: 22448 | elapsed time per iteration (ms): 13615.0 | learning rate: 6.226E-06 | global batch size: 16 | lm loss: 7.295056E+00 | loss scale: 16384.0 | grad norm: 109464.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1404/ 159576 | consumed samples: 22464 | elapsed time per iteration (ms): 13479.3 | learning rate: 6.231E-06 | global batch size: 16 | lm loss: 7.070762E+00 | loss scale: 16384.0 | grad norm: 70008.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1405/ 159576 | consumed samples: 22480 | elapsed time per iteration (ms): 13573.2 | learning rate: 6.235E-06 | global batch size: 16 | lm loss: 7.206651E+00 | loss scale: 16384.0 | grad norm: 71456.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1406/ 159576 | consumed samples: 22496 | elapsed time per iteration (ms): 13670.7 | learning rate: 6.240E-06 | global batch size: 16 | lm loss: 7.421339E+00 | loss scale: 16384.0 | grad norm: 81529.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1407/ 159576 | consumed samples: 22512 | elapsed time per iteration (ms): 13510.9 | learning rate: 6.244E-06 | global batch size: 16 | lm loss: 7.245395E+00 | loss scale: 16384.0 | grad norm: 120780.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1408/ 159576 | consumed samples: 22528 | elapsed time per iteration (ms): 13544.4 | learning rate: 6.249E-06 | global batch size: 16 | lm loss: 7.479702E+00 | loss scale: 16384.0 | grad norm: 98091.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1409/ 159576 | consumed samples: 22544 | elapsed time per iteration (ms): 13558.7 | learning rate: 6.253E-06 | global batch size: 16 | lm loss: 7.220355E+00 | loss scale: 16384.0 | grad norm: 71818.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1410/ 159576 | consumed samples: 22560 | elapsed time per iteration (ms): 13949.7 | learning rate: 6.257E-06 | global batch size: 16 | lm loss: 7.381415E+00 | loss scale: 16384.0 | grad norm: 80168.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1411/ 159576 | consumed samples: 22576 | elapsed time per iteration (ms): 13573.4 | learning rate: 6.262E-06 | global batch size: 16 | lm loss: 7.330766E+00 | loss scale: 16384.0 | grad norm: 107261.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1412/ 159576 | consumed samples: 22592 | elapsed time per iteration (ms): 13522.9 | learning rate: 6.266E-06 | global batch size: 16 | lm loss: 7.378265E+00 | loss scale: 16384.0 | grad norm: 115619.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1413/ 159576 | consumed samples: 22608 | elapsed time per iteration (ms): 13584.4 | learning rate: 6.271E-06 | global batch size: 16 | lm loss: 7.202836E+00 | loss scale: 16384.0 | grad norm: 70230.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1414/ 159576 | consumed samples: 22624 | elapsed time per iteration (ms): 13797.1 | learning rate: 6.275E-06 | global batch size: 16 | lm loss: 7.202533E+00 | loss scale: 16384.0 | grad norm: 122640.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1415/ 159576 | consumed samples: 22640 | elapsed time per iteration (ms): 13736.9 | learning rate: 6.280E-06 | global batch size: 16 | lm loss: 7.271989E+00 | loss scale: 16384.0 | grad norm: 80706.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1416/ 159576 | consumed samples: 22656 | elapsed time per iteration (ms): 13603.3 | learning rate: 6.284E-06 | global batch size: 16 | lm loss: 7.350783E+00 | loss scale: 16384.0 | grad norm: 106402.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1417/ 159576 | consumed samples: 22672 | elapsed time per iteration (ms): 13663.2 | learning rate: 6.288E-06 | global batch size: 16 | lm loss: 7.629884E+00 | loss scale: 16384.0 | grad norm: 111978.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1418/ 159576 | consumed samples: 22688 | elapsed time per iteration (ms): 13512.0 | learning rate: 6.293E-06 | global batch size: 16 | lm loss: 7.276966E+00 | loss scale: 16384.0 | grad norm: 86564.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1419/ 159576 | consumed samples: 22704 | elapsed time per iteration (ms): 13947.9 | learning rate: 6.297E-06 | global batch size: 16 | lm loss: 7.109100E+00 | loss scale: 16384.0 | grad norm: 85621.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1420/ 159576 | consumed samples: 22720 | elapsed time per iteration (ms): 13554.6 | learning rate: 6.302E-06 | global batch size: 16 | lm loss: 7.234724E+00 | loss scale: 16384.0 | grad norm: 115238.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1421/ 159576 | consumed samples: 22736 | elapsed time per iteration (ms): 13608.2 | learning rate: 6.306E-06 | global batch size: 16 | lm loss: 7.134557E+00 | loss scale: 16384.0 | grad norm: 127475.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1422/ 159576 | consumed samples: 22752 | elapsed time per iteration (ms): 13564.6 | learning rate: 6.311E-06 | global batch size: 16 | lm loss: 7.096246E+00 | loss scale: 16384.0 | grad norm: 92678.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1423/ 159576 | consumed samples: 22768 | elapsed time per iteration (ms): 13993.7 | learning rate: 6.315E-06 | global batch size: 16 | lm loss: 7.215540E+00 | loss scale: 16384.0 | grad norm: 77823.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1424/ 159576 | consumed samples: 22784 | elapsed time per iteration (ms): 13635.8 | learning rate: 6.320E-06 | global batch size: 16 | lm loss: 7.332169E+00 | loss scale: 16384.0 | grad norm: 88585.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1425/ 159576 | consumed samples: 22800 | elapsed time per iteration (ms): 13477.0 | learning rate: 6.324E-06 | global batch size: 16 | lm loss: 7.224688E+00 | loss scale: 16384.0 | grad norm: 98593.171 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1426/ 159576 | consumed samples: 22816 | elapsed time per iteration (ms): 13579.9 | learning rate: 6.328E-06 | global batch size: 16 | lm loss: 7.330650E+00 | loss scale: 16384.0 | grad norm: 101929.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1427/ 159576 | consumed samples: 22832 | elapsed time per iteration (ms): 13559.4 | learning rate: 6.333E-06 | global batch size: 16 | lm loss: 7.261027E+00 | loss scale: 16384.0 | grad norm: 79893.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1428/ 159576 | consumed samples: 22848 | elapsed time per iteration (ms): 13656.6 | learning rate: 6.337E-06 | global batch size: 16 | lm loss: 7.050019E+00 | loss scale: 16384.0 | grad norm: 197668.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1429/ 159576 | consumed samples: 22864 | elapsed time per iteration (ms): 13549.3 | learning rate: 6.342E-06 | global batch size: 16 | lm loss: 7.283052E+00 | loss scale: 16384.0 | grad norm: 185482.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1430/ 159576 | consumed samples: 22880 | elapsed time per iteration (ms): 13566.6 | learning rate: 6.346E-06 | global batch size: 16 | lm loss: 7.251038E+00 | loss scale: 16384.0 | grad norm: 81246.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1431/ 159576 | consumed samples: 22896 | elapsed time per iteration (ms): 13626.6 | learning rate: 6.351E-06 | global batch size: 16 | lm loss: 7.363044E+00 | loss scale: 16384.0 | grad norm: 89555.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1432/ 159576 | consumed samples: 22912 | elapsed time per iteration (ms): 14023.4 | learning rate: 6.355E-06 | global batch size: 16 | lm loss: 7.350190E+00 | loss scale: 16384.0 | grad norm: 151476.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1433/ 159576 | consumed samples: 22928 | elapsed time per iteration (ms): 13376.0 | learning rate: 6.359E-06 | global batch size: 16 | lm loss: 7.294331E+00 | loss scale: 16384.0 | grad norm: 148300.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1434/ 159576 | consumed samples: 22944 | elapsed time per iteration (ms): 13594.6 | learning rate: 6.364E-06 | global batch size: 16 | lm loss: 7.178850E+00 | loss scale: 16384.0 | grad norm: 115814.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1435/ 159576 | consumed samples: 22960 | elapsed time per iteration (ms): 13589.5 | learning rate: 6.368E-06 | global batch size: 16 | lm loss: 7.174537E+00 | loss scale: 16384.0 | grad norm: 89057.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1436/ 159576 | consumed samples: 22976 | elapsed time per iteration (ms): 13854.5 | learning rate: 6.373E-06 | global batch size: 16 | lm loss: 7.455090E+00 | loss scale: 16384.0 | grad norm: 143357.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1437/ 159576 | consumed samples: 22992 | elapsed time per iteration (ms): 13800.5 | learning rate: 6.377E-06 | global batch size: 16 | lm loss: 7.230480E+00 | loss scale: 16384.0 | grad norm: 124647.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1438/ 159576 | consumed samples: 23008 | elapsed time per iteration (ms): 13574.3 | learning rate: 6.382E-06 | global batch size: 16 | lm loss: 7.214196E+00 | loss scale: 16384.0 | grad norm: 90534.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1439/ 159576 | consumed samples: 23024 | elapsed time per iteration (ms): 13559.7 | learning rate: 6.386E-06 | global batch size: 16 | lm loss: 7.228687E+00 | loss scale: 16384.0 | grad norm: 100823.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1440/ 159576 | consumed samples: 23040 | elapsed time per iteration (ms): 13580.1 | learning rate: 6.391E-06 | global batch size: 16 | lm loss: 7.297411E+00 | loss scale: 16384.0 | grad norm: 72207.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1441/ 159576 | consumed samples: 23056 | elapsed time per iteration (ms): 13763.6 | learning rate: 6.395E-06 | global batch size: 16 | lm loss: 7.403437E+00 | loss scale: 16384.0 | grad norm: 227400.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1442/ 159576 | consumed samples: 23072 | elapsed time per iteration (ms): 13606.0 | learning rate: 6.399E-06 | global batch size: 16 | lm loss: 7.267770E+00 | loss scale: 16384.0 | grad norm: 178424.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1443/ 159576 | consumed samples: 23088 | elapsed time per iteration (ms): 13579.5 | learning rate: 6.404E-06 | global batch size: 16 | lm loss: 7.196310E+00 | loss scale: 16384.0 | grad norm: 93737.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1444/ 159576 | consumed samples: 23104 | elapsed time per iteration (ms): 13564.8 | learning rate: 6.408E-06 | global batch size: 16 | lm loss: 7.180475E+00 | loss scale: 16384.0 | grad norm: 107567.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1445/ 159576 | consumed samples: 23120 | elapsed time per iteration (ms): 14086.1 | learning rate: 6.413E-06 | global batch size: 16 | lm loss: 7.235699E+00 | loss scale: 16384.0 | grad norm: 90017.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1446/ 159576 | consumed samples: 23136 | elapsed time per iteration (ms): 13420.4 | learning rate: 6.417E-06 | global batch size: 16 | lm loss: 7.131771E+00 | loss scale: 16384.0 | grad norm: 200715.783 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1447/ 159576 | consumed samples: 23152 | elapsed time per iteration (ms): 13582.8 | learning rate: 6.422E-06 | global batch size: 16 | lm loss: 7.147336E+00 | loss scale: 16384.0 | grad norm: 139041.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1448/ 159576 | consumed samples: 23168 | elapsed time per iteration (ms): 13591.5 | learning rate: 6.426E-06 | global batch size: 16 | lm loss: 7.223548E+00 | loss scale: 16384.0 | grad norm: 81314.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1449/ 159576 | consumed samples: 23184 | elapsed time per iteration (ms): 13543.2 | learning rate: 6.430E-06 | global batch size: 16 | lm loss: 7.126436E+00 | loss scale: 16384.0 | grad norm: 104656.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1450/ 159576 | consumed samples: 23200 | elapsed time per iteration (ms): 13771.0 | learning rate: 6.435E-06 | global batch size: 16 | lm loss: 7.239769E+00 | loss scale: 16384.0 | grad norm: 55782.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1451/ 159576 | consumed samples: 23216 | elapsed time per iteration (ms): 13581.7 | learning rate: 6.439E-06 | global batch size: 16 | lm loss: 7.431156E+00 | loss scale: 16384.0 | grad norm: 265376.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1452/ 159576 | consumed samples: 23232 | elapsed time per iteration (ms): 13633.4 | learning rate: 6.444E-06 | global batch size: 16 | lm loss: 7.120412E+00 | loss scale: 16384.0 | grad norm: 153821.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1453/ 159576 | consumed samples: 23248 | elapsed time per iteration (ms): 13510.9 | learning rate: 6.448E-06 | global batch size: 16 | lm loss: 7.361814E+00 | loss scale: 16384.0 | grad norm: 91484.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1454/ 159576 | consumed samples: 23264 | elapsed time per iteration (ms): 14008.9 | learning rate: 6.453E-06 | global batch size: 16 | lm loss: 7.429213E+00 | loss scale: 16384.0 | grad norm: 95193.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1455/ 159576 | consumed samples: 23280 | elapsed time per iteration (ms): 13534.7 | learning rate: 6.457E-06 | global batch size: 16 | lm loss: 7.311771E+00 | loss scale: 16384.0 | grad norm: 99688.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1456/ 159576 | consumed samples: 23296 | elapsed time per iteration (ms): 13570.9 | learning rate: 6.462E-06 | global batch size: 16 | lm loss: 7.326795E+00 | loss scale: 16384.0 | grad norm: 199002.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1457/ 159576 | consumed samples: 23312 | elapsed time per iteration (ms): 13567.6 | learning rate: 6.466E-06 | global batch size: 16 | lm loss: 7.238305E+00 | loss scale: 16384.0 | grad norm: 148524.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1458/ 159576 | consumed samples: 23328 | elapsed time per iteration (ms): 14002.9 | learning rate: 6.470E-06 | global batch size: 16 | lm loss: 7.170752E+00 | loss scale: 16384.0 | grad norm: 83892.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1459/ 159576 | consumed samples: 23344 | elapsed time per iteration (ms): 13758.9 | learning rate: 6.475E-06 | global batch size: 16 | lm loss: 7.148302E+00 | loss scale: 16384.0 | grad norm: 92326.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1460/ 159576 | consumed samples: 23360 | elapsed time per iteration (ms): 13596.9 | learning rate: 6.479E-06 | global batch size: 16 | lm loss: 7.386099E+00 | loss scale: 16384.0 | grad norm: 141912.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1461/ 159576 | consumed samples: 23376 | elapsed time per iteration (ms): 13627.4 | learning rate: 6.484E-06 | global batch size: 16 | lm loss: 7.288848E+00 | loss scale: 16384.0 | grad norm: 170265.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1462/ 159576 | consumed samples: 23392 | elapsed time per iteration (ms): 13618.4 | learning rate: 6.488E-06 | global batch size: 16 | lm loss: 7.229756E+00 | loss scale: 16384.0 | grad norm: 120999.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1463/ 159576 | consumed samples: 23408 | elapsed time per iteration (ms): 13656.7 | learning rate: 6.493E-06 | global batch size: 16 | lm loss: 7.281564E+00 | loss scale: 16384.0 | grad norm: 93039.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1464/ 159576 | consumed samples: 23424 | elapsed time per iteration (ms): 13645.1 | learning rate: 6.497E-06 | global batch size: 16 | lm loss: 7.287534E+00 | loss scale: 16384.0 | grad norm: 80620.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1465/ 159576 | consumed samples: 23440 | elapsed time per iteration (ms): 13567.3 | learning rate: 6.501E-06 | global batch size: 16 | lm loss: 7.328496E+00 | loss scale: 16384.0 | grad norm: 125622.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1466/ 159576 | consumed samples: 23456 | elapsed time per iteration (ms): 13597.3 | learning rate: 6.506E-06 | global batch size: 16 | lm loss: 7.289563E+00 | loss scale: 16384.0 | grad norm: 115928.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1467/ 159576 | consumed samples: 23472 | elapsed time per iteration (ms): 13941.8 | learning rate: 6.510E-06 | global batch size: 16 | lm loss: 7.383677E+00 | loss scale: 16384.0 | grad norm: 88787.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1468/ 159576 | consumed samples: 23488 | elapsed time per iteration (ms): 13557.9 | learning rate: 6.515E-06 | global batch size: 16 | lm loss: 7.200576E+00 | loss scale: 16384.0 | grad norm: 72136.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1469/ 159576 | consumed samples: 23504 | elapsed time per iteration (ms): 13659.8 | learning rate: 6.519E-06 | global batch size: 16 | lm loss: 7.237146E+00 | loss scale: 16384.0 | grad norm: 80384.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1470/ 159576 | consumed samples: 23520 | elapsed time per iteration (ms): 13520.5 | learning rate: 6.524E-06 | global batch size: 16 | lm loss: 7.087498E+00 | loss scale: 16384.0 | grad norm: 84910.064 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1471/ 159576 | consumed samples: 23536 | elapsed time per iteration (ms): 13587.4 | learning rate: 6.528E-06 | global batch size: 16 | lm loss: 7.201303E+00 | loss scale: 16384.0 | grad norm: 82344.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1472/ 159576 | consumed samples: 23552 | elapsed time per iteration (ms): 13785.3 | learning rate: 6.533E-06 | global batch size: 16 | lm loss: 7.099293E+00 | loss scale: 16384.0 | grad norm: 90694.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1473/ 159576 | consumed samples: 23568 | elapsed time per iteration (ms): 13564.5 | learning rate: 6.537E-06 | global batch size: 16 | lm loss: 7.241871E+00 | loss scale: 16384.0 | grad norm: 49829.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1474/ 159576 | consumed samples: 23584 | elapsed time per iteration (ms): 13624.0 | learning rate: 6.541E-06 | global batch size: 16 | lm loss: 7.157920E+00 | loss scale: 16384.0 | grad norm: 134064.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1475/ 159576 | consumed samples: 23600 | elapsed time per iteration (ms): 13651.2 | learning rate: 6.546E-06 | global batch size: 16 | lm loss: 7.214240E+00 | loss scale: 16384.0 | grad norm: 86872.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1476/ 159576 | consumed samples: 23616 | elapsed time per iteration (ms): 14166.8 | learning rate: 6.550E-06 | global batch size: 16 | lm loss: 7.192460E+00 | loss scale: 16384.0 | grad norm: 80848.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1477/ 159576 | consumed samples: 23632 | elapsed time per iteration (ms): 13604.7 | learning rate: 6.555E-06 | global batch size: 16 | lm loss: 7.323776E+00 | loss scale: 16384.0 | grad norm: 70454.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1478/ 159576 | consumed samples: 23648 | elapsed time per iteration (ms): 13572.6 | learning rate: 6.559E-06 | global batch size: 16 | lm loss: 7.268590E+00 | loss scale: 16384.0 | grad norm: 71693.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1479/ 159576 | consumed samples: 23664 | elapsed time per iteration (ms): 13608.6 | learning rate: 6.564E-06 | global batch size: 16 | lm loss: 7.296487E+00 | loss scale: 16384.0 | grad norm: 81654.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1480/ 159576 | consumed samples: 23680 | elapsed time per iteration (ms): 14039.7 | learning rate: 6.568E-06 | global batch size: 16 | lm loss: 7.090362E+00 | loss scale: 16384.0 | grad norm: 64201.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1481/ 159576 | consumed samples: 23696 | elapsed time per iteration (ms): 13583.2 | learning rate: 6.572E-06 | global batch size: 16 | lm loss: 7.375229E+00 | loss scale: 16384.0 | grad norm: 113007.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1482/ 159576 | consumed samples: 23712 | elapsed time per iteration (ms): 13660.9 | learning rate: 6.577E-06 | global batch size: 16 | lm loss: 7.293176E+00 | loss scale: 16384.0 | grad norm: 77498.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1483/ 159576 | consumed samples: 23728 | elapsed time per iteration (ms): 13614.0 | learning rate: 6.581E-06 | global batch size: 16 | lm loss: 7.336072E+00 | loss scale: 16384.0 | grad norm: 110912.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1484/ 159576 | consumed samples: 23744 | elapsed time per iteration (ms): 13566.7 | learning rate: 6.586E-06 | global batch size: 16 | lm loss: 7.364174E+00 | loss scale: 16384.0 | grad norm: 183688.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1485/ 159576 | consumed samples: 23760 | elapsed time per iteration (ms): 13815.4 | learning rate: 6.590E-06 | global batch size: 16 | lm loss: 7.239150E+00 | loss scale: 16384.0 | grad norm: 72249.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1486/ 159576 | consumed samples: 23776 | elapsed time per iteration (ms): 13589.6 | learning rate: 6.595E-06 | global batch size: 16 | lm loss: 7.200100E+00 | loss scale: 16384.0 | grad norm: 96228.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1487/ 159576 | consumed samples: 23792 | elapsed time per iteration (ms): 13607.7 | learning rate: 6.599E-06 | global batch size: 16 | lm loss: 7.292061E+00 | loss scale: 16384.0 | grad norm: 121424.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1488/ 159576 | consumed samples: 23808 | elapsed time per iteration (ms): 13632.1 | learning rate: 6.604E-06 | global batch size: 16 | lm loss: 7.136326E+00 | loss scale: 16384.0 | grad norm: 126581.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1489/ 159576 | consumed samples: 23824 | elapsed time per iteration (ms): 14024.4 | learning rate: 6.608E-06 | global batch size: 16 | lm loss: 7.314082E+00 | loss scale: 16384.0 | grad norm: 81672.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1490/ 159576 | consumed samples: 23840 | elapsed time per iteration (ms): 13562.3 | learning rate: 6.612E-06 | global batch size: 16 | lm loss: 7.220848E+00 | loss scale: 16384.0 | grad norm: 124864.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1491/ 159576 | consumed samples: 23856 | elapsed time per iteration (ms): 13573.1 | learning rate: 6.617E-06 | global batch size: 16 | lm loss: 7.139018E+00 | loss scale: 16384.0 | grad norm: 91430.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1492/ 159576 | consumed samples: 23872 | elapsed time per iteration (ms): 13614.3 | learning rate: 6.621E-06 | global batch size: 16 | lm loss: 7.268013E+00 | loss scale: 16384.0 | grad norm: 135716.036 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1493/ 159576 | consumed samples: 23888 | elapsed time per iteration (ms): 13616.6 | learning rate: 6.626E-06 | global batch size: 16 | lm loss: 7.252588E+00 | loss scale: 16384.0 | grad norm: 83740.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1494/ 159576 | consumed samples: 23904 | elapsed time per iteration (ms): 13959.7 | learning rate: 6.630E-06 | global batch size: 16 | lm loss: 6.975100E+00 | loss scale: 16384.0 | grad norm: 83284.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1495/ 159576 | consumed samples: 23920 | elapsed time per iteration (ms): 13605.9 | learning rate: 6.635E-06 | global batch size: 16 | lm loss: 7.372656E+00 | loss scale: 16384.0 | grad norm: 69225.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1496/ 159576 | consumed samples: 23936 | elapsed time per iteration (ms): 13623.3 | learning rate: 6.639E-06 | global batch size: 16 | lm loss: 7.219198E+00 | loss scale: 16384.0 | grad norm: 115429.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1497/ 159576 | consumed samples: 23952 | elapsed time per iteration (ms): 13627.9 | learning rate: 6.643E-06 | global batch size: 16 | lm loss: 7.340521E+00 | loss scale: 16384.0 | grad norm: 85290.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1498/ 159576 | consumed samples: 23968 | elapsed time per iteration (ms): 13884.3 | learning rate: 6.648E-06 | global batch size: 16 | lm loss: 7.186238E+00 | loss scale: 16384.0 | grad norm: 114903.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1499/ 159576 | consumed samples: 23984 | elapsed time per iteration (ms): 13657.7 | learning rate: 6.652E-06 | global batch size: 16 | lm loss: 7.208917E+00 | loss scale: 16384.0 | grad norm: 92023.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1500/ 159576 | consumed samples: 24000 | elapsed time per iteration (ms): 13609.5 | learning rate: 6.657E-06 | global batch size: 16 | lm loss: 7.191697E+00 | loss scale: 32768.0 | grad norm: 96438.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 1500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-24 08:02:02,260] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step1500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 1500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17270.67 iteration 1501/ 159576 | consumed samples: 24016 | elapsed time per iteration (ms): 31402.6 | learning rate: 6.661E-06 | global batch size: 16 | lm loss: 7.276592E+00 | loss scale: 32768.0 | grad norm: 161966.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1502/ 159576 | consumed samples: 24032 | elapsed time per iteration (ms): 13594.3 | learning rate: 6.666E-06 | global batch size: 16 | lm loss: 7.318794E+00 | loss scale: 32768.0 | grad norm: 194567.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1503/ 159576 | consumed samples: 24048 | elapsed time per iteration (ms): 13587.2 | learning rate: 6.670E-06 | global batch size: 16 | lm loss: 7.168730E+00 | loss scale: 32768.0 | grad norm: 147316.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1504/ 159576 | consumed samples: 24064 | elapsed time per iteration (ms): 13690.3 | learning rate: 6.675E-06 | global batch size: 16 | lm loss: 7.199265E+00 | loss scale: 32768.0 | grad norm: 160502.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1505/ 159576 | consumed samples: 24080 | elapsed time per iteration (ms): 14065.5 | learning rate: 6.679E-06 | global batch size: 16 | lm loss: 7.004994E+00 | loss scale: 32768.0 | grad norm: 126147.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1506/ 159576 | consumed samples: 24096 | elapsed time per iteration (ms): 13542.1 | learning rate: 6.683E-06 | global batch size: 16 | lm loss: 7.322471E+00 | loss scale: 32768.0 | grad norm: 196683.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1507/ 159576 | consumed samples: 24112 | elapsed time per iteration (ms): 13669.0 | learning rate: 6.688E-06 | global batch size: 16 | lm loss: 7.393982E+00 | loss scale: 32768.0 | grad norm: 190898.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 08:03:56] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1162855_[2-10%1] on 'gpu_p13' partition) [2021-09-24 08:03:56] PULSE: tr8-104B is running for 2:11:45 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 1508/ 159576 | consumed samples: 24128 | elapsed time per iteration (ms): 13530.1 | learning rate: 6.692E-06 | global batch size: 16 | lm loss: 7.303823E+00 | loss scale: 32768.0 | grad norm: 138876.766 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1509/ 159576 | consumed samples: 24144 | elapsed time per iteration (ms): 13620.2 | learning rate: 6.697E-06 | global batch size: 16 | lm loss: 7.181733E+00 | loss scale: 32768.0 | grad norm: 245330.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1510/ 159576 | consumed samples: 24160 | elapsed time per iteration (ms): 13857.7 | learning rate: 6.701E-06 | global batch size: 16 | lm loss: 7.249762E+00 | loss scale: 32768.0 | grad norm: 178346.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1511/ 159576 | consumed samples: 24176 | elapsed time per iteration (ms): 13642.0 | learning rate: 6.706E-06 | global batch size: 16 | lm loss: 7.141682E+00 | loss scale: 32768.0 | grad norm: 225502.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1512/ 159576 | consumed samples: 24192 | elapsed time per iteration (ms): 13680.2 | learning rate: 6.710E-06 | global batch size: 16 | lm loss: 7.262461E+00 | loss scale: 32768.0 | grad norm: 152013.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1513/ 159576 | consumed samples: 24208 | elapsed time per iteration (ms): 6867.5 | learning rate: 6.710E-06 | global batch size: 16 | lm loss: 7.117817E+00 | loss scale: 32768.0 | grad norm: 152013.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1514/ 159576 | consumed samples: 24224 | elapsed time per iteration (ms): 13192.9 | learning rate: 6.714E-06 | global batch size: 16 | lm loss: 7.508438E+00 | loss scale: 32768.0 | grad norm: 277772.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1515/ 159576 | consumed samples: 24240 | elapsed time per iteration (ms): 13697.2 | learning rate: 6.719E-06 | global batch size: 16 | lm loss: 7.055306E+00 | loss scale: 32768.0 | grad norm: 184291.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1516/ 159576 | consumed samples: 24256 | elapsed time per iteration (ms): 13601.8 | learning rate: 6.723E-06 | global batch size: 16 | lm loss: 7.364224E+00 | loss scale: 32768.0 | grad norm: 153076.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1517/ 159576 | consumed samples: 24272 | elapsed time per iteration (ms): 13603.6 | learning rate: 6.728E-06 | global batch size: 16 | lm loss: 6.912699E+00 | loss scale: 32768.0 | grad norm: 218098.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1518/ 159576 | consumed samples: 24288 | elapsed time per iteration (ms): 13640.7 | learning rate: 6.732E-06 | global batch size: 16 | lm loss: 7.323909E+00 | loss scale: 32768.0 | grad norm: 216972.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1519/ 159576 | consumed samples: 24304 | elapsed time per iteration (ms): 14045.8 | learning rate: 6.737E-06 | global batch size: 16 | lm loss: 7.068207E+00 | loss scale: 32768.0 | grad norm: 118810.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1520/ 159576 | consumed samples: 24320 | elapsed time per iteration (ms): 13595.0 | learning rate: 6.741E-06 | global batch size: 16 | lm loss: 7.160398E+00 | loss scale: 32768.0 | grad norm: 174748.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1521/ 159576 | consumed samples: 24336 | elapsed time per iteration (ms): 13611.5 | learning rate: 6.746E-06 | global batch size: 16 | lm loss: 7.170628E+00 | loss scale: 32768.0 | grad norm: 146800.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1522/ 159576 | consumed samples: 24352 | elapsed time per iteration (ms): 13576.3 | learning rate: 6.750E-06 | global batch size: 16 | lm loss: 7.141685E+00 | loss scale: 32768.0 | grad norm: 301970.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1523/ 159576 | consumed samples: 24368 | elapsed time per iteration (ms): 13818.0 | learning rate: 6.754E-06 | global batch size: 16 | lm loss: 7.351134E+00 | loss scale: 32768.0 | grad norm: 203560.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1524/ 159576 | consumed samples: 24384 | elapsed time per iteration (ms): 13700.8 | learning rate: 6.759E-06 | global batch size: 16 | lm loss: 7.291396E+00 | loss scale: 32768.0 | grad norm: 186296.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1525/ 159576 | consumed samples: 24400 | elapsed time per iteration (ms): 13611.8 | learning rate: 6.763E-06 | global batch size: 16 | lm loss: 7.052688E+00 | loss scale: 32768.0 | grad norm: 186235.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1526/ 159576 | consumed samples: 24416 | elapsed time per iteration (ms): 13626.5 | learning rate: 6.768E-06 | global batch size: 16 | lm loss: 7.083735E+00 | loss scale: 32768.0 | grad norm: 254298.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1527/ 159576 | consumed samples: 24432 | elapsed time per iteration (ms): 13677.9 | learning rate: 6.772E-06 | global batch size: 16 | lm loss: 7.212967E+00 | loss scale: 32768.0 | grad norm: 290009.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1528/ 159576 | consumed samples: 24448 | elapsed time per iteration (ms): 13998.5 | learning rate: 6.777E-06 | global batch size: 16 | lm loss: 7.249606E+00 | loss scale: 32768.0 | grad norm: 193082.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1529/ 159576 | consumed samples: 24464 | elapsed time per iteration (ms): 13543.2 | learning rate: 6.781E-06 | global batch size: 16 | lm loss: 7.187498E+00 | loss scale: 32768.0 | grad norm: 161368.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1530/ 159576 | consumed samples: 24480 | elapsed time per iteration (ms): 13565.1 | learning rate: 6.786E-06 | global batch size: 16 | lm loss: 7.266234E+00 | loss scale: 32768.0 | grad norm: 198639.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1531/ 159576 | consumed samples: 24496 | elapsed time per iteration (ms): 13571.4 | learning rate: 6.790E-06 | global batch size: 16 | lm loss: 7.528541E+00 | loss scale: 32768.0 | grad norm: 545404.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1532/ 159576 | consumed samples: 24512 | elapsed time per iteration (ms): 13970.0 | learning rate: 6.794E-06 | global batch size: 16 | lm loss: 7.212701E+00 | loss scale: 32768.0 | grad norm: 227881.927 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1533/ 159576 | consumed samples: 24528 | elapsed time per iteration (ms): 13566.3 | learning rate: 6.799E-06 | global batch size: 16 | lm loss: 7.440462E+00 | loss scale: 32768.0 | grad norm: 170454.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1534/ 159576 | consumed samples: 24544 | elapsed time per iteration (ms): 13611.2 | learning rate: 6.803E-06 | global batch size: 16 | lm loss: 7.264073E+00 | loss scale: 32768.0 | grad norm: 306199.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1535/ 159576 | consumed samples: 24560 | elapsed time per iteration (ms): 13661.5 | learning rate: 6.808E-06 | global batch size: 16 | lm loss: 7.109380E+00 | loss scale: 32768.0 | grad norm: 130108.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1536/ 159576 | consumed samples: 24576 | elapsed time per iteration (ms): 13539.1 | learning rate: 6.812E-06 | global batch size: 16 | lm loss: 7.475006E+00 | loss scale: 32768.0 | grad norm: 447958.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1537/ 159576 | consumed samples: 24592 | elapsed time per iteration (ms): 13698.1 | learning rate: 6.817E-06 | global batch size: 16 | lm loss: 7.372583E+00 | loss scale: 32768.0 | grad norm: 233240.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1538/ 159576 | consumed samples: 24608 | elapsed time per iteration (ms): 13601.5 | learning rate: 6.821E-06 | global batch size: 16 | lm loss: 7.208574E+00 | loss scale: 32768.0 | grad norm: 208866.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1539/ 159576 | consumed samples: 24624 | elapsed time per iteration (ms): 13645.6 | learning rate: 6.825E-06 | global batch size: 16 | lm loss: 7.209548E+00 | loss scale: 32768.0 | grad norm: 290418.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1540/ 159576 | consumed samples: 24640 | elapsed time per iteration (ms): 13628.1 | learning rate: 6.830E-06 | global batch size: 16 | lm loss: 7.168006E+00 | loss scale: 32768.0 | grad norm: 271187.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1541/ 159576 | consumed samples: 24656 | elapsed time per iteration (ms): 14103.2 | learning rate: 6.834E-06 | global batch size: 16 | lm loss: 7.235812E+00 | loss scale: 32768.0 | grad norm: 368637.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1542/ 159576 | consumed samples: 24672 | elapsed time per iteration (ms): 13752.7 | learning rate: 6.839E-06 | global batch size: 16 | lm loss: 7.205466E+00 | loss scale: 32768.0 | grad norm: 275606.149 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1543/ 159576 | consumed samples: 24688 | elapsed time per iteration (ms): 13526.0 | learning rate: 6.843E-06 | global batch size: 16 | lm loss: 7.152663E+00 | loss scale: 32768.0 | grad norm: 186385.977 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1544/ 159576 | consumed samples: 24704 | elapsed time per iteration (ms): 13591.1 | learning rate: 6.848E-06 | global batch size: 16 | lm loss: 7.402153E+00 | loss scale: 32768.0 | grad norm: 202784.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1545/ 159576 | consumed samples: 24720 | elapsed time per iteration (ms): 13853.8 | learning rate: 6.852E-06 | global batch size: 16 | lm loss: 7.254861E+00 | loss scale: 32768.0 | grad norm: 302847.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1546/ 159576 | consumed samples: 24736 | elapsed time per iteration (ms): 13718.3 | learning rate: 6.857E-06 | global batch size: 16 | lm loss: 7.259928E+00 | loss scale: 32768.0 | grad norm: 190927.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1547/ 159576 | consumed samples: 24752 | elapsed time per iteration (ms): 13565.0 | learning rate: 6.861E-06 | global batch size: 16 | lm loss: 7.226044E+00 | loss scale: 32768.0 | grad norm: 147732.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1548/ 159576 | consumed samples: 24768 | elapsed time per iteration (ms): 13562.3 | learning rate: 6.865E-06 | global batch size: 16 | lm loss: 7.106945E+00 | loss scale: 32768.0 | grad norm: 275364.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1549/ 159576 | consumed samples: 24784 | elapsed time per iteration (ms): 13573.3 | learning rate: 6.870E-06 | global batch size: 16 | lm loss: 7.157021E+00 | loss scale: 32768.0 | grad norm: 180244.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1550/ 159576 | consumed samples: 24800 | elapsed time per iteration (ms): 13916.8 | learning rate: 6.874E-06 | global batch size: 16 | lm loss: 7.001479E+00 | loss scale: 32768.0 | grad norm: 268566.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1551/ 159576 | consumed samples: 24816 | elapsed time per iteration (ms): 13651.8 | learning rate: 6.879E-06 | global batch size: 16 | lm loss: 7.167608E+00 | loss scale: 32768.0 | grad norm: 198735.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1552/ 159576 | consumed samples: 24832 | elapsed time per iteration (ms): 13608.0 | learning rate: 6.883E-06 | global batch size: 16 | lm loss: 7.093953E+00 | loss scale: 32768.0 | grad norm: 170933.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1553/ 159576 | consumed samples: 24848 | elapsed time per iteration (ms): 13517.6 | learning rate: 6.888E-06 | global batch size: 16 | lm loss: 7.234317E+00 | loss scale: 32768.0 | grad norm: 237231.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1554/ 159576 | consumed samples: 24864 | elapsed time per iteration (ms): 14011.1 | learning rate: 6.892E-06 | global batch size: 16 | lm loss: 7.130560E+00 | loss scale: 32768.0 | grad norm: 237902.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1555/ 159576 | consumed samples: 24880 | elapsed time per iteration (ms): 13510.9 | learning rate: 6.896E-06 | global batch size: 16 | lm loss: 7.275712E+00 | loss scale: 32768.0 | grad norm: 149656.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1556/ 159576 | consumed samples: 24896 | elapsed time per iteration (ms): 13617.0 | learning rate: 6.901E-06 | global batch size: 16 | lm loss: 7.239087E+00 | loss scale: 32768.0 | grad norm: 186987.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1557/ 159576 | consumed samples: 24912 | elapsed time per iteration (ms): 13622.7 | learning rate: 6.905E-06 | global batch size: 16 | lm loss: 6.972548E+00 | loss scale: 32768.0 | grad norm: 167404.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1558/ 159576 | consumed samples: 24928 | elapsed time per iteration (ms): 13629.7 | learning rate: 6.910E-06 | global batch size: 16 | lm loss: 7.274665E+00 | loss scale: 32768.0 | grad norm: 170409.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1559/ 159576 | consumed samples: 24944 | elapsed time per iteration (ms): 13856.8 | learning rate: 6.914E-06 | global batch size: 16 | lm loss: 7.320499E+00 | loss scale: 32768.0 | grad norm: 139509.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1560/ 159576 | consumed samples: 24960 | elapsed time per iteration (ms): 13572.0 | learning rate: 6.919E-06 | global batch size: 16 | lm loss: 7.481147E+00 | loss scale: 32768.0 | grad norm: 204961.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1561/ 159576 | consumed samples: 24976 | elapsed time per iteration (ms): 13609.9 | learning rate: 6.923E-06 | global batch size: 16 | lm loss: 7.318799E+00 | loss scale: 32768.0 | grad norm: 233741.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1562/ 159576 | consumed samples: 24992 | elapsed time per iteration (ms): 13593.5 | learning rate: 6.928E-06 | global batch size: 16 | lm loss: 6.970228E+00 | loss scale: 32768.0 | grad norm: 159417.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1563/ 159576 | consumed samples: 25008 | elapsed time per iteration (ms): 13894.7 | learning rate: 6.932E-06 | global batch size: 16 | lm loss: 7.266310E+00 | loss scale: 32768.0 | grad norm: 154081.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1564/ 159576 | consumed samples: 25024 | elapsed time per iteration (ms): 13687.0 | learning rate: 6.936E-06 | global batch size: 16 | lm loss: 7.274476E+00 | loss scale: 32768.0 | grad norm: 258666.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1565/ 159576 | consumed samples: 25040 | elapsed time per iteration (ms): 13663.3 | learning rate: 6.941E-06 | global batch size: 16 | lm loss: 7.125623E+00 | loss scale: 32768.0 | grad norm: 167968.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1566/ 159576 | consumed samples: 25056 | elapsed time per iteration (ms): 13604.1 | learning rate: 6.945E-06 | global batch size: 16 | lm loss: 7.210727E+00 | loss scale: 32768.0 | grad norm: 198543.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1567/ 159576 | consumed samples: 25072 | elapsed time per iteration (ms): 14015.2 | learning rate: 6.950E-06 | global batch size: 16 | lm loss: 7.245472E+00 | loss scale: 32768.0 | grad norm: 149711.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1568/ 159576 | consumed samples: 25088 | elapsed time per iteration (ms): 13524.3 | learning rate: 6.954E-06 | global batch size: 16 | lm loss: 6.959779E+00 | loss scale: 32768.0 | grad norm: 217321.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1569/ 159576 | consumed samples: 25104 | elapsed time per iteration (ms): 13601.8 | learning rate: 6.959E-06 | global batch size: 16 | lm loss: 7.177199E+00 | loss scale: 32768.0 | grad norm: 254297.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1570/ 159576 | consumed samples: 25120 | elapsed time per iteration (ms): 13589.9 | learning rate: 6.963E-06 | global batch size: 16 | lm loss: 7.113214E+00 | loss scale: 32768.0 | grad norm: 172729.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1571/ 159576 | consumed samples: 25136 | elapsed time per iteration (ms): 13658.1 | learning rate: 6.967E-06 | global batch size: 16 | lm loss: 7.054616E+00 | loss scale: 32768.0 | grad norm: 176859.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1572/ 159576 | consumed samples: 25152 | elapsed time per iteration (ms): 13798.6 | learning rate: 6.972E-06 | global batch size: 16 | lm loss: 7.111713E+00 | loss scale: 32768.0 | grad norm: 165282.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1573/ 159576 | consumed samples: 25168 | elapsed time per iteration (ms): 13684.6 | learning rate: 6.976E-06 | global batch size: 16 | lm loss: 7.324330E+00 | loss scale: 32768.0 | grad norm: 205395.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1574/ 159576 | consumed samples: 25184 | elapsed time per iteration (ms): 13612.3 | learning rate: 6.981E-06 | global batch size: 16 | lm loss: 7.139562E+00 | loss scale: 32768.0 | grad norm: 201180.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1575/ 159576 | consumed samples: 25200 | elapsed time per iteration (ms): 13567.2 | learning rate: 6.985E-06 | global batch size: 16 | lm loss: 7.063004E+00 | loss scale: 32768.0 | grad norm: 126181.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1576/ 159576 | consumed samples: 25216 | elapsed time per iteration (ms): 13982.4 | learning rate: 6.990E-06 | global batch size: 16 | lm loss: 7.030066E+00 | loss scale: 32768.0 | grad norm: 261758.694 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1577/ 159576 | consumed samples: 25232 | elapsed time per iteration (ms): 13552.2 | learning rate: 6.994E-06 | global batch size: 16 | lm loss: 7.129750E+00 | loss scale: 32768.0 | grad norm: 133747.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1578/ 159576 | consumed samples: 25248 | elapsed time per iteration (ms): 13576.0 | learning rate: 6.999E-06 | global batch size: 16 | lm loss: 7.478085E+00 | loss scale: 32768.0 | grad norm: 193421.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1579/ 159576 | consumed samples: 25264 | elapsed time per iteration (ms): 13627.7 | learning rate: 7.003E-06 | global batch size: 16 | lm loss: 7.062607E+00 | loss scale: 32768.0 | grad norm: 162309.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1580/ 159576 | consumed samples: 25280 | elapsed time per iteration (ms): 13870.0 | learning rate: 7.007E-06 | global batch size: 16 | lm loss: 6.734056E+00 | loss scale: 32768.0 | grad norm: 233732.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1581/ 159576 | consumed samples: 25296 | elapsed time per iteration (ms): 13680.5 | learning rate: 7.012E-06 | global batch size: 16 | lm loss: 7.360079E+00 | loss scale: 32768.0 | grad norm: 189405.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1582/ 159576 | consumed samples: 25312 | elapsed time per iteration (ms): 13679.9 | learning rate: 7.016E-06 | global batch size: 16 | lm loss: 7.291443E+00 | loss scale: 32768.0 | grad norm: 159639.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1583/ 159576 | consumed samples: 25328 | elapsed time per iteration (ms): 13579.9 | learning rate: 7.021E-06 | global batch size: 16 | lm loss: 7.361541E+00 | loss scale: 32768.0 | grad norm: 178947.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1584/ 159576 | consumed samples: 25344 | elapsed time per iteration (ms): 13614.6 | learning rate: 7.025E-06 | global batch size: 16 | lm loss: 7.145397E+00 | loss scale: 32768.0 | grad norm: 198293.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1585/ 159576 | consumed samples: 25360 | elapsed time per iteration (ms): 13943.5 | learning rate: 7.030E-06 | global batch size: 16 | lm loss: 7.009763E+00 | loss scale: 32768.0 | grad norm: 172995.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1586/ 159576 | consumed samples: 25376 | elapsed time per iteration (ms): 13665.6 | learning rate: 7.034E-06 | global batch size: 16 | lm loss: 7.306109E+00 | loss scale: 32768.0 | grad norm: 193555.142 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1587/ 159576 | consumed samples: 25392 | elapsed time per iteration (ms): 13713.0 | learning rate: 7.038E-06 | global batch size: 16 | lm loss: 7.341703E+00 | loss scale: 32768.0 | grad norm: 240981.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1588/ 159576 | consumed samples: 25408 | elapsed time per iteration (ms): 13685.0 | learning rate: 7.043E-06 | global batch size: 16 | lm loss: 7.076401E+00 | loss scale: 32768.0 | grad norm: 144170.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1589/ 159576 | consumed samples: 25424 | elapsed time per iteration (ms): 13990.2 | learning rate: 7.047E-06 | global batch size: 16 | lm loss: 7.016201E+00 | loss scale: 32768.0 | grad norm: 215101.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1590/ 159576 | consumed samples: 25440 | elapsed time per iteration (ms): 13615.2 | learning rate: 7.052E-06 | global batch size: 16 | lm loss: 7.248097E+00 | loss scale: 32768.0 | grad norm: 183674.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1591/ 159576 | consumed samples: 25456 | elapsed time per iteration (ms): 13603.7 | learning rate: 7.056E-06 | global batch size: 16 | lm loss: 7.274388E+00 | loss scale: 32768.0 | grad norm: 194912.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1592/ 159576 | consumed samples: 25472 | elapsed time per iteration (ms): 13589.1 | learning rate: 7.061E-06 | global batch size: 16 | lm loss: 7.189001E+00 | loss scale: 32768.0 | grad norm: 178991.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1593/ 159576 | consumed samples: 25488 | elapsed time per iteration (ms): 13610.8 | learning rate: 7.065E-06 | global batch size: 16 | lm loss: 7.232603E+00 | loss scale: 32768.0 | grad norm: 152962.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1594/ 159576 | consumed samples: 25504 | elapsed time per iteration (ms): 13768.0 | learning rate: 7.070E-06 | global batch size: 16 | lm loss: 7.102697E+00 | loss scale: 32768.0 | grad norm: 144835.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1595/ 159576 | consumed samples: 25520 | elapsed time per iteration (ms): 13616.0 | learning rate: 7.074E-06 | global batch size: 16 | lm loss: 7.124231E+00 | loss scale: 32768.0 | grad norm: 492597.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1596/ 159576 | consumed samples: 25536 | elapsed time per iteration (ms): 13671.0 | learning rate: 7.078E-06 | global batch size: 16 | lm loss: 7.347673E+00 | loss scale: 32768.0 | grad norm: 283986.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1597/ 159576 | consumed samples: 25552 | elapsed time per iteration (ms): 13618.5 | learning rate: 7.083E-06 | global batch size: 16 | lm loss: 7.247316E+00 | loss scale: 32768.0 | grad norm: 185319.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1598/ 159576 | consumed samples: 25568 | elapsed time per iteration (ms): 14074.4 | learning rate: 7.087E-06 | global batch size: 16 | lm loss: 7.152137E+00 | loss scale: 32768.0 | grad norm: 179820.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1599/ 159576 | consumed samples: 25584 | elapsed time per iteration (ms): 13609.5 | learning rate: 7.092E-06 | global batch size: 16 | lm loss: 7.087896E+00 | loss scale: 32768.0 | grad norm: 178653.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1600/ 159576 | consumed samples: 25600 | elapsed time per iteration (ms): 13606.5 | learning rate: 7.096E-06 | global batch size: 16 | lm loss: 7.094335E+00 | loss scale: 32768.0 | grad norm: 197442.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1601/ 159576 | consumed samples: 25616 | elapsed time per iteration (ms): 13605.3 | learning rate: 7.101E-06 | global batch size: 16 | lm loss: 7.230387E+00 | loss scale: 32768.0 | grad norm: 277453.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1602/ 159576 | consumed samples: 25632 | elapsed time per iteration (ms): 14026.8 | learning rate: 7.105E-06 | global batch size: 16 | lm loss: 7.399794E+00 | loss scale: 32768.0 | grad norm: 202190.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1603/ 159576 | consumed samples: 25648 | elapsed time per iteration (ms): 13782.5 | learning rate: 7.109E-06 | global batch size: 16 | lm loss: 7.261839E+00 | loss scale: 32768.0 | grad norm: 162395.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1604/ 159576 | consumed samples: 25664 | elapsed time per iteration (ms): 13652.4 | learning rate: 7.114E-06 | global batch size: 16 | lm loss: 7.202652E+00 | loss scale: 32768.0 | grad norm: 199798.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1605/ 159576 | consumed samples: 25680 | elapsed time per iteration (ms): 13537.9 | learning rate: 7.118E-06 | global batch size: 16 | lm loss: 7.002069E+00 | loss scale: 32768.0 | grad norm: 200932.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1606/ 159576 | consumed samples: 25696 | elapsed time per iteration (ms): 13623.9 | learning rate: 7.123E-06 | global batch size: 16 | lm loss: 6.994870E+00 | loss scale: 32768.0 | grad norm: 182105.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1607/ 159576 | consumed samples: 25712 | elapsed time per iteration (ms): 13778.9 | learning rate: 7.127E-06 | global batch size: 16 | lm loss: 7.236290E+00 | loss scale: 32768.0 | grad norm: 210525.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1608/ 159576 | consumed samples: 25728 | elapsed time per iteration (ms): 13614.0 | learning rate: 7.132E-06 | global batch size: 16 | lm loss: 7.271640E+00 | loss scale: 32768.0 | grad norm: 155104.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1609/ 159576 | consumed samples: 25744 | elapsed time per iteration (ms): 13637.4 | learning rate: 7.136E-06 | global batch size: 16 | lm loss: 7.142178E+00 | loss scale: 32768.0 | grad norm: 179013.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1610/ 159576 | consumed samples: 25760 | elapsed time per iteration (ms): 13663.2 | learning rate: 7.141E-06 | global batch size: 16 | lm loss: 7.233703E+00 | loss scale: 32768.0 | grad norm: 205415.974 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1611/ 159576 | consumed samples: 25776 | elapsed time per iteration (ms): 14078.6 | learning rate: 7.145E-06 | global batch size: 16 | lm loss: 7.137359E+00 | loss scale: 32768.0 | grad norm: 211115.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1612/ 159576 | consumed samples: 25792 | elapsed time per iteration (ms): 13476.7 | learning rate: 7.149E-06 | global batch size: 16 | lm loss: 7.265315E+00 | loss scale: 32768.0 | grad norm: 221323.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1613/ 159576 | consumed samples: 25808 | elapsed time per iteration (ms): 13601.4 | learning rate: 7.154E-06 | global batch size: 16 | lm loss: 7.092045E+00 | loss scale: 32768.0 | grad norm: 157009.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1614/ 159576 | consumed samples: 25824 | elapsed time per iteration (ms): 13616.6 | learning rate: 7.158E-06 | global batch size: 16 | lm loss: 7.018819E+00 | loss scale: 32768.0 | grad norm: 198533.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1615/ 159576 | consumed samples: 25840 | elapsed time per iteration (ms): 13623.7 | learning rate: 7.163E-06 | global batch size: 16 | lm loss: 7.280205E+00 | loss scale: 32768.0 | grad norm: 288417.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1616/ 159576 | consumed samples: 25856 | elapsed time per iteration (ms): 13877.9 | learning rate: 7.167E-06 | global batch size: 16 | lm loss: 7.224732E+00 | loss scale: 32768.0 | grad norm: 186062.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1617/ 159576 | consumed samples: 25872 | elapsed time per iteration (ms): 13663.6 | learning rate: 7.172E-06 | global batch size: 16 | lm loss: 7.238441E+00 | loss scale: 32768.0 | grad norm: 168294.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1618/ 159576 | consumed samples: 25888 | elapsed time per iteration (ms): 13675.4 | learning rate: 7.176E-06 | global batch size: 16 | lm loss: 7.159503E+00 | loss scale: 32768.0 | grad norm: 181012.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1619/ 159576 | consumed samples: 25904 | elapsed time per iteration (ms): 13559.3 | learning rate: 7.180E-06 | global batch size: 16 | lm loss: 7.125117E+00 | loss scale: 32768.0 | grad norm: 156261.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1620/ 159576 | consumed samples: 25920 | elapsed time per iteration (ms): 14141.4 | learning rate: 7.185E-06 | global batch size: 16 | lm loss: 7.312489E+00 | loss scale: 32768.0 | grad norm: 501804.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1621/ 159576 | consumed samples: 25936 | elapsed time per iteration (ms): 13619.8 | learning rate: 7.189E-06 | global batch size: 16 | lm loss: 7.144738E+00 | loss scale: 32768.0 | grad norm: 187512.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1622/ 159576 | consumed samples: 25952 | elapsed time per iteration (ms): 13623.1 | learning rate: 7.194E-06 | global batch size: 16 | lm loss: 7.036147E+00 | loss scale: 32768.0 | grad norm: 185668.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1623/ 159576 | consumed samples: 25968 | elapsed time per iteration (ms): 13626.1 | learning rate: 7.198E-06 | global batch size: 16 | lm loss: 6.981637E+00 | loss scale: 32768.0 | grad norm: 194478.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1624/ 159576 | consumed samples: 25984 | elapsed time per iteration (ms): 13916.5 | learning rate: 7.203E-06 | global batch size: 16 | lm loss: 7.098595E+00 | loss scale: 32768.0 | grad norm: 176876.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1625/ 159576 | consumed samples: 26000 | elapsed time per iteration (ms): 13897.1 | learning rate: 7.207E-06 | global batch size: 16 | lm loss: 7.024785E+00 | loss scale: 32768.0 | grad norm: 133422.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1626/ 159576 | consumed samples: 26016 | elapsed time per iteration (ms): 13553.3 | learning rate: 7.212E-06 | global batch size: 16 | lm loss: 7.101878E+00 | loss scale: 32768.0 | grad norm: 187471.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1627/ 159576 | consumed samples: 26032 | elapsed time per iteration (ms): 13608.6 | learning rate: 7.216E-06 | global batch size: 16 | lm loss: 7.083658E+00 | loss scale: 32768.0 | grad norm: 163022.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1628/ 159576 | consumed samples: 26048 | elapsed time per iteration (ms): 13598.7 | learning rate: 7.220E-06 | global batch size: 16 | lm loss: 7.128680E+00 | loss scale: 32768.0 | grad norm: 227341.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1629/ 159576 | consumed samples: 26064 | elapsed time per iteration (ms): 13737.0 | learning rate: 7.225E-06 | global batch size: 16 | lm loss: 7.226182E+00 | loss scale: 32768.0 | grad norm: 173557.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1630/ 159576 | consumed samples: 26080 | elapsed time per iteration (ms): 13598.4 | learning rate: 7.229E-06 | global batch size: 16 | lm loss: 7.204190E+00 | loss scale: 32768.0 | grad norm: 194336.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1631/ 159576 | consumed samples: 26096 | elapsed time per iteration (ms): 13618.5 | learning rate: 7.234E-06 | global batch size: 16 | lm loss: 7.295867E+00 | loss scale: 32768.0 | grad norm: 218111.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1632/ 159576 | consumed samples: 26112 | elapsed time per iteration (ms): 13608.1 | learning rate: 7.238E-06 | global batch size: 16 | lm loss: 7.313629E+00 | loss scale: 32768.0 | grad norm: 150755.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1633/ 159576 | consumed samples: 26128 | elapsed time per iteration (ms): 13926.3 | learning rate: 7.243E-06 | global batch size: 16 | lm loss: 7.105534E+00 | loss scale: 32768.0 | grad norm: 416417.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1634/ 159576 | consumed samples: 26144 | elapsed time per iteration (ms): 13573.4 | learning rate: 7.247E-06 | global batch size: 16 | lm loss: 7.154237E+00 | loss scale: 32768.0 | grad norm: 222886.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1635/ 159576 | consumed samples: 26160 | elapsed time per iteration (ms): 13613.9 | learning rate: 7.251E-06 | global batch size: 16 | lm loss: 7.367383E+00 | loss scale: 32768.0 | grad norm: 198928.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1636/ 159576 | consumed samples: 26176 | elapsed time per iteration (ms): 13620.0 | learning rate: 7.256E-06 | global batch size: 16 | lm loss: 7.224826E+00 | loss scale: 32768.0 | grad norm: 190490.724 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1637/ 159576 | consumed samples: 26192 | elapsed time per iteration (ms): 13847.4 | learning rate: 7.260E-06 | global batch size: 16 | lm loss: 7.133263E+00 | loss scale: 32768.0 | grad norm: 335044.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1638/ 159576 | consumed samples: 26208 | elapsed time per iteration (ms): 13680.4 | learning rate: 7.265E-06 | global batch size: 16 | lm loss: 6.991650E+00 | loss scale: 32768.0 | grad norm: 351935.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1639/ 159576 | consumed samples: 26224 | elapsed time per iteration (ms): 13603.3 | learning rate: 7.269E-06 | global batch size: 16 | lm loss: 7.261710E+00 | loss scale: 32768.0 | grad norm: 162679.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1640/ 159576 | consumed samples: 26240 | elapsed time per iteration (ms): 13643.0 | learning rate: 7.274E-06 | global batch size: 16 | lm loss: 7.243075E+00 | loss scale: 32768.0 | grad norm: 139259.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1641/ 159576 | consumed samples: 26256 | elapsed time per iteration (ms): 13685.4 | learning rate: 7.278E-06 | global batch size: 16 | lm loss: 7.347486E+00 | loss scale: 32768.0 | grad norm: 190145.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1642/ 159576 | consumed samples: 26272 | elapsed time per iteration (ms): 13709.0 | learning rate: 7.283E-06 | global batch size: 16 | lm loss: 7.168586E+00 | loss scale: 32768.0 | grad norm: 250612.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1643/ 159576 | consumed samples: 26288 | elapsed time per iteration (ms): 13686.3 | learning rate: 7.287E-06 | global batch size: 16 | lm loss: 7.042645E+00 | loss scale: 32768.0 | grad norm: 181688.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1644/ 159576 | consumed samples: 26304 | elapsed time per iteration (ms): 13617.6 | learning rate: 7.291E-06 | global batch size: 16 | lm loss: 6.992811E+00 | loss scale: 32768.0 | grad norm: 173387.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1645/ 159576 | consumed samples: 26320 | elapsed time per iteration (ms): 13588.3 | learning rate: 7.296E-06 | global batch size: 16 | lm loss: 6.948548E+00 | loss scale: 32768.0 | grad norm: 204171.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1646/ 159576 | consumed samples: 26336 | elapsed time per iteration (ms): 13943.8 | learning rate: 7.300E-06 | global batch size: 16 | lm loss: 7.227940E+00 | loss scale: 32768.0 | grad norm: 249546.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1647/ 159576 | consumed samples: 26352 | elapsed time per iteration (ms): 13526.7 | learning rate: 7.305E-06 | global batch size: 16 | lm loss: 7.150325E+00 | loss scale: 32768.0 | grad norm: 187163.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1648/ 159576 | consumed samples: 26368 | elapsed time per iteration (ms): 13689.1 | learning rate: 7.309E-06 | global batch size: 16 | lm loss: 7.017026E+00 | loss scale: 32768.0 | grad norm: 155331.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1649/ 159576 | consumed samples: 26384 | elapsed time per iteration (ms): 13592.0 | learning rate: 7.314E-06 | global batch size: 16 | lm loss: 6.946849E+00 | loss scale: 32768.0 | grad norm: 224463.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1650/ 159576 | consumed samples: 26400 | elapsed time per iteration (ms): 13576.3 | learning rate: 7.318E-06 | global batch size: 16 | lm loss: 7.179192E+00 | loss scale: 32768.0 | grad norm: 276611.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1651/ 159576 | consumed samples: 26416 | elapsed time per iteration (ms): 13958.1 | learning rate: 7.322E-06 | global batch size: 16 | lm loss: 7.176366E+00 | loss scale: 32768.0 | grad norm: 180366.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1652/ 159576 | consumed samples: 26432 | elapsed time per iteration (ms): 13632.4 | learning rate: 7.327E-06 | global batch size: 16 | lm loss: 7.206745E+00 | loss scale: 32768.0 | grad norm: 135845.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1653/ 159576 | consumed samples: 26448 | elapsed time per iteration (ms): 13613.1 | learning rate: 7.331E-06 | global batch size: 16 | lm loss: 7.259154E+00 | loss scale: 32768.0 | grad norm: 403068.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1654/ 159576 | consumed samples: 26464 | elapsed time per iteration (ms): 13593.5 | learning rate: 7.336E-06 | global batch size: 16 | lm loss: 7.201679E+00 | loss scale: 32768.0 | grad norm: 362463.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1655/ 159576 | consumed samples: 26480 | elapsed time per iteration (ms): 14016.8 | learning rate: 7.340E-06 | global batch size: 16 | lm loss: 7.291797E+00 | loss scale: 32768.0 | grad norm: 167369.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1656/ 159576 | consumed samples: 26496 | elapsed time per iteration (ms): 13699.1 | learning rate: 7.345E-06 | global batch size: 16 | lm loss: 7.091952E+00 | loss scale: 32768.0 | grad norm: 165135.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1657/ 159576 | consumed samples: 26512 | elapsed time per iteration (ms): 13569.2 | learning rate: 7.349E-06 | global batch size: 16 | lm loss: 7.068718E+00 | loss scale: 32768.0 | grad norm: 202181.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1658/ 159576 | consumed samples: 26528 | elapsed time per iteration (ms): 13577.2 | learning rate: 7.354E-06 | global batch size: 16 | lm loss: 7.233033E+00 | loss scale: 32768.0 | grad norm: 333361.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1659/ 159576 | consumed samples: 26544 | elapsed time per iteration (ms): 13970.5 | learning rate: 7.358E-06 | global batch size: 16 | lm loss: 7.330973E+00 | loss scale: 32768.0 | grad norm: 164401.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1660/ 159576 | consumed samples: 26560 | elapsed time per iteration (ms): 13585.6 | learning rate: 7.362E-06 | global batch size: 16 | lm loss: 7.127686E+00 | loss scale: 32768.0 | grad norm: 165830.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1661/ 159576 | consumed samples: 26576 | elapsed time per iteration (ms): 13601.7 | learning rate: 7.367E-06 | global batch size: 16 | lm loss: 7.202850E+00 | loss scale: 32768.0 | grad norm: 214035.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1662/ 159576 | consumed samples: 26592 | elapsed time per iteration (ms): 13596.7 | learning rate: 7.371E-06 | global batch size: 16 | lm loss: 7.194968E+00 | loss scale: 32768.0 | grad norm: 269427.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1663/ 159576 | consumed samples: 26608 | elapsed time per iteration (ms): 13626.2 | learning rate: 7.376E-06 | global batch size: 16 | lm loss: 7.079875E+00 | loss scale: 32768.0 | grad norm: 243204.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1664/ 159576 | consumed samples: 26624 | elapsed time per iteration (ms): 13820.6 | learning rate: 7.380E-06 | global batch size: 16 | lm loss: 7.253979E+00 | loss scale: 32768.0 | grad norm: 184892.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1665/ 159576 | consumed samples: 26640 | elapsed time per iteration (ms): 13606.7 | learning rate: 7.385E-06 | global batch size: 16 | lm loss: 7.021820E+00 | loss scale: 32768.0 | grad norm: 220398.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1666/ 159576 | consumed samples: 26656 | elapsed time per iteration (ms): 13594.3 | learning rate: 7.389E-06 | global batch size: 16 | lm loss: 7.115512E+00 | loss scale: 32768.0 | grad norm: 307682.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1667/ 159576 | consumed samples: 26672 | elapsed time per iteration (ms): 13584.1 | learning rate: 7.393E-06 | global batch size: 16 | lm loss: 7.301219E+00 | loss scale: 32768.0 | grad norm: 326739.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1668/ 159576 | consumed samples: 26688 | elapsed time per iteration (ms): 13934.9 | learning rate: 7.398E-06 | global batch size: 16 | lm loss: 7.091152E+00 | loss scale: 32768.0 | grad norm: 179218.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1669/ 159576 | consumed samples: 26704 | elapsed time per iteration (ms): 13576.9 | learning rate: 7.402E-06 | global batch size: 16 | lm loss: 7.060991E+00 | loss scale: 32768.0 | grad norm: 212478.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1670/ 159576 | consumed samples: 26720 | elapsed time per iteration (ms): 13622.1 | learning rate: 7.407E-06 | global batch size: 16 | lm loss: 7.225494E+00 | loss scale: 32768.0 | grad norm: 312859.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1671/ 159576 | consumed samples: 26736 | elapsed time per iteration (ms): 13558.9 | learning rate: 7.411E-06 | global batch size: 16 | lm loss: 6.931543E+00 | loss scale: 32768.0 | grad norm: 214910.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1672/ 159576 | consumed samples: 26752 | elapsed time per iteration (ms): 13593.0 | learning rate: 7.416E-06 | global batch size: 16 | lm loss: 7.111391E+00 | loss scale: 32768.0 | grad norm: 167374.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1673/ 159576 | consumed samples: 26768 | elapsed time per iteration (ms): 14083.5 | learning rate: 7.420E-06 | global batch size: 16 | lm loss: 7.119873E+00 | loss scale: 32768.0 | grad norm: 207656.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1674/ 159576 | consumed samples: 26784 | elapsed time per iteration (ms): 13580.7 | learning rate: 7.425E-06 | global batch size: 16 | lm loss: 7.190612E+00 | loss scale: 32768.0 | grad norm: 138716.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1675/ 159576 | consumed samples: 26800 | elapsed time per iteration (ms): 13560.5 | learning rate: 7.429E-06 | global batch size: 16 | lm loss: 7.118540E+00 | loss scale: 32768.0 | grad norm: 288523.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1676/ 159576 | consumed samples: 26816 | elapsed time per iteration (ms): 13591.4 | learning rate: 7.433E-06 | global batch size: 16 | lm loss: 7.228687E+00 | loss scale: 32768.0 | grad norm: 184651.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1677/ 159576 | consumed samples: 26832 | elapsed time per iteration (ms): 14019.3 | learning rate: 7.438E-06 | global batch size: 16 | lm loss: 7.062222E+00 | loss scale: 32768.0 | grad norm: 166988.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1678/ 159576 | consumed samples: 26848 | elapsed time per iteration (ms): 13663.4 | learning rate: 7.442E-06 | global batch size: 16 | lm loss: 7.206205E+00 | loss scale: 32768.0 | grad norm: 760966.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1679/ 159576 | consumed samples: 26864 | elapsed time per iteration (ms): 13583.3 | learning rate: 7.447E-06 | global batch size: 16 | lm loss: 7.183750E+00 | loss scale: 32768.0 | grad norm: 619056.103 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1680/ 159576 | consumed samples: 26880 | elapsed time per iteration (ms): 13598.8 | learning rate: 7.451E-06 | global batch size: 16 | lm loss: 7.188565E+00 | loss scale: 32768.0 | grad norm: 363445.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1681/ 159576 | consumed samples: 26896 | elapsed time per iteration (ms): 14083.3 | learning rate: 7.456E-06 | global batch size: 16 | lm loss: 7.135269E+00 | loss scale: 32768.0 | grad norm: 201434.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1682/ 159576 | consumed samples: 26912 | elapsed time per iteration (ms): 13432.4 | learning rate: 7.460E-06 | global batch size: 16 | lm loss: 7.080773E+00 | loss scale: 32768.0 | grad norm: 223123.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1683/ 159576 | consumed samples: 26928 | elapsed time per iteration (ms): 13629.9 | learning rate: 7.464E-06 | global batch size: 16 | lm loss: 7.018581E+00 | loss scale: 32768.0 | grad norm: 160716.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1684/ 159576 | consumed samples: 26944 | elapsed time per iteration (ms): 13543.1 | learning rate: 7.469E-06 | global batch size: 16 | lm loss: 7.045646E+00 | loss scale: 32768.0 | grad norm: 319366.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1685/ 159576 | consumed samples: 26960 | elapsed time per iteration (ms): 13556.0 | learning rate: 7.473E-06 | global batch size: 16 | lm loss: 7.139486E+00 | loss scale: 32768.0 | grad norm: 154250.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1686/ 159576 | consumed samples: 26976 | elapsed time per iteration (ms): 13875.3 | learning rate: 7.478E-06 | global batch size: 16 | lm loss: 7.146173E+00 | loss scale: 32768.0 | grad norm: 186495.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1687/ 159576 | consumed samples: 26992 | elapsed time per iteration (ms): 13583.8 | learning rate: 7.482E-06 | global batch size: 16 | lm loss: 7.207047E+00 | loss scale: 32768.0 | grad norm: 129574.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1688/ 159576 | consumed samples: 27008 | elapsed time per iteration (ms): 13590.1 | learning rate: 7.487E-06 | global batch size: 16 | lm loss: 7.150177E+00 | loss scale: 32768.0 | grad norm: 310199.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1689/ 159576 | consumed samples: 27024 | elapsed time per iteration (ms): 13636.7 | learning rate: 7.491E-06 | global batch size: 16 | lm loss: 7.136959E+00 | loss scale: 32768.0 | grad norm: 142456.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1690/ 159576 | consumed samples: 27040 | elapsed time per iteration (ms): 13898.3 | learning rate: 7.496E-06 | global batch size: 16 | lm loss: 6.991103E+00 | loss scale: 32768.0 | grad norm: 206942.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1691/ 159576 | consumed samples: 27056 | elapsed time per iteration (ms): 13637.0 | learning rate: 7.500E-06 | global batch size: 16 | lm loss: 7.147140E+00 | loss scale: 32768.0 | grad norm: 297164.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1692/ 159576 | consumed samples: 27072 | elapsed time per iteration (ms): 13592.2 | learning rate: 7.504E-06 | global batch size: 16 | lm loss: 7.166695E+00 | loss scale: 32768.0 | grad norm: 174829.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1693/ 159576 | consumed samples: 27088 | elapsed time per iteration (ms): 13634.0 | learning rate: 7.509E-06 | global batch size: 16 | lm loss: 7.124074E+00 | loss scale: 32768.0 | grad norm: 356202.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1694/ 159576 | consumed samples: 27104 | elapsed time per iteration (ms): 13929.9 | learning rate: 7.513E-06 | global batch size: 16 | lm loss: 7.219958E+00 | loss scale: 32768.0 | grad norm: 288764.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1695/ 159576 | consumed samples: 27120 | elapsed time per iteration (ms): 13812.8 | learning rate: 7.518E-06 | global batch size: 16 | lm loss: 7.030488E+00 | loss scale: 32768.0 | grad norm: 164638.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1696/ 159576 | consumed samples: 27136 | elapsed time per iteration (ms): 13601.5 | learning rate: 7.522E-06 | global batch size: 16 | lm loss: 7.288185E+00 | loss scale: 32768.0 | grad norm: 241747.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1697/ 159576 | consumed samples: 27152 | elapsed time per iteration (ms): 13619.0 | learning rate: 7.527E-06 | global batch size: 16 | lm loss: 7.110942E+00 | loss scale: 32768.0 | grad norm: 183251.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1698/ 159576 | consumed samples: 27168 | elapsed time per iteration (ms): 13580.4 | learning rate: 7.531E-06 | global batch size: 16 | lm loss: 7.096193E+00 | loss scale: 32768.0 | grad norm: 187930.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1699/ 159576 | consumed samples: 27184 | elapsed time per iteration (ms): 14055.7 | learning rate: 7.536E-06 | global batch size: 16 | lm loss: 6.976962E+00 | loss scale: 32768.0 | grad norm: 186599.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1700/ 159576 | consumed samples: 27200 | elapsed time per iteration (ms): 13642.0 | learning rate: 7.540E-06 | global batch size: 16 | lm loss: 6.916706E+00 | loss scale: 32768.0 | grad norm: 212948.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1701/ 159576 | consumed samples: 27216 | elapsed time per iteration (ms): 13615.0 | learning rate: 7.544E-06 | global batch size: 16 | lm loss: 7.194331E+00 | loss scale: 32768.0 | grad norm: 144812.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1702/ 159576 | consumed samples: 27232 | elapsed time per iteration (ms): 13551.3 | learning rate: 7.549E-06 | global batch size: 16 | lm loss: 7.139325E+00 | loss scale: 32768.0 | grad norm: 331590.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1703/ 159576 | consumed samples: 27248 | elapsed time per iteration (ms): 13973.8 | learning rate: 7.553E-06 | global batch size: 16 | lm loss: 7.042914E+00 | loss scale: 32768.0 | grad norm: 195366.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1704/ 159576 | consumed samples: 27264 | elapsed time per iteration (ms): 13614.8 | learning rate: 7.558E-06 | global batch size: 16 | lm loss: 7.087082E+00 | loss scale: 32768.0 | grad norm: 217381.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1705/ 159576 | consumed samples: 27280 | elapsed time per iteration (ms): 13611.2 | learning rate: 7.562E-06 | global batch size: 16 | lm loss: 7.013979E+00 | loss scale: 32768.0 | grad norm: 198091.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1706/ 159576 | consumed samples: 27296 | elapsed time per iteration (ms): 13574.3 | learning rate: 7.567E-06 | global batch size: 16 | lm loss: 7.016004E+00 | loss scale: 32768.0 | grad norm: 222098.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1707/ 159576 | consumed samples: 27312 | elapsed time per iteration (ms): 13629.3 | learning rate: 7.571E-06 | global batch size: 16 | lm loss: 7.175000E+00 | loss scale: 32768.0 | grad norm: 409215.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1708/ 159576 | consumed samples: 27328 | elapsed time per iteration (ms): 13904.2 | learning rate: 7.575E-06 | global batch size: 16 | lm loss: 7.071371E+00 | loss scale: 32768.0 | grad norm: 273410.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1709/ 159576 | consumed samples: 27344 | elapsed time per iteration (ms): 13558.1 | learning rate: 7.580E-06 | global batch size: 16 | lm loss: 7.002718E+00 | loss scale: 32768.0 | grad norm: 197884.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1710/ 159576 | consumed samples: 27360 | elapsed time per iteration (ms): 13639.3 | learning rate: 7.584E-06 | global batch size: 16 | lm loss: 7.323861E+00 | loss scale: 32768.0 | grad norm: 172073.111 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1711/ 159576 | consumed samples: 27376 | elapsed time per iteration (ms): 13631.6 | learning rate: 7.589E-06 | global batch size: 16 | lm loss: 6.922392E+00 | loss scale: 32768.0 | grad norm: 326721.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1712/ 159576 | consumed samples: 27392 | elapsed time per iteration (ms): 13982.8 | learning rate: 7.593E-06 | global batch size: 16 | lm loss: 7.148055E+00 | loss scale: 32768.0 | grad norm: 280337.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1713/ 159576 | consumed samples: 27408 | elapsed time per iteration (ms): 13635.8 | learning rate: 7.598E-06 | global batch size: 16 | lm loss: 7.088178E+00 | loss scale: 32768.0 | grad norm: 200762.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1714/ 159576 | consumed samples: 27424 | elapsed time per iteration (ms): 13581.9 | learning rate: 7.602E-06 | global batch size: 16 | lm loss: 7.096650E+00 | loss scale: 32768.0 | grad norm: 204299.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1715/ 159576 | consumed samples: 27440 | elapsed time per iteration (ms): 13647.6 | learning rate: 7.607E-06 | global batch size: 16 | lm loss: 6.916616E+00 | loss scale: 32768.0 | grad norm: 127407.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1716/ 159576 | consumed samples: 27456 | elapsed time per iteration (ms): 13904.0 | learning rate: 7.611E-06 | global batch size: 16 | lm loss: 7.066643E+00 | loss scale: 32768.0 | grad norm: 371440.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1717/ 159576 | consumed samples: 27472 | elapsed time per iteration (ms): 13717.4 | learning rate: 7.615E-06 | global batch size: 16 | lm loss: 7.332389E+00 | loss scale: 32768.0 | grad norm: 403592.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1718/ 159576 | consumed samples: 27488 | elapsed time per iteration (ms): 13591.7 | learning rate: 7.620E-06 | global batch size: 16 | lm loss: 7.055027E+00 | loss scale: 32768.0 | grad norm: 200151.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1719/ 159576 | consumed samples: 27504 | elapsed time per iteration (ms): 13560.8 | learning rate: 7.624E-06 | global batch size: 16 | lm loss: 7.176567E+00 | loss scale: 32768.0 | grad norm: 144423.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1720/ 159576 | consumed samples: 27520 | elapsed time per iteration (ms): 13600.7 | learning rate: 7.629E-06 | global batch size: 16 | lm loss: 6.984463E+00 | loss scale: 32768.0 | grad norm: 303766.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1721/ 159576 | consumed samples: 27536 | elapsed time per iteration (ms): 13892.8 | learning rate: 7.633E-06 | global batch size: 16 | lm loss: 6.990324E+00 | loss scale: 32768.0 | grad norm: 154861.936 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1722/ 159576 | consumed samples: 27552 | elapsed time per iteration (ms): 13527.0 | learning rate: 7.638E-06 | global batch size: 16 | lm loss: 7.238751E+00 | loss scale: 32768.0 | grad norm: 231731.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1723/ 159576 | consumed samples: 27568 | elapsed time per iteration (ms): 13536.8 | learning rate: 7.642E-06 | global batch size: 16 | lm loss: 7.130395E+00 | loss scale: 32768.0 | grad norm: 190824.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1724/ 159576 | consumed samples: 27584 | elapsed time per iteration (ms): 13580.6 | learning rate: 7.646E-06 | global batch size: 16 | lm loss: 7.182058E+00 | loss scale: 32768.0 | grad norm: 266208.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1725/ 159576 | consumed samples: 27600 | elapsed time per iteration (ms): 13961.0 | learning rate: 7.651E-06 | global batch size: 16 | lm loss: 7.108085E+00 | loss scale: 32768.0 | grad norm: 284420.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1726/ 159576 | consumed samples: 27616 | elapsed time per iteration (ms): 13537.5 | learning rate: 7.655E-06 | global batch size: 16 | lm loss: 7.049166E+00 | loss scale: 32768.0 | grad norm: 189929.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1727/ 159576 | consumed samples: 27632 | elapsed time per iteration (ms): 13583.4 | learning rate: 7.660E-06 | global batch size: 16 | lm loss: 7.012967E+00 | loss scale: 32768.0 | grad norm: 174720.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1728/ 159576 | consumed samples: 27648 | elapsed time per iteration (ms): 13605.5 | learning rate: 7.664E-06 | global batch size: 16 | lm loss: 7.237570E+00 | loss scale: 32768.0 | grad norm: 194798.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1729/ 159576 | consumed samples: 27664 | elapsed time per iteration (ms): 13552.5 | learning rate: 7.669E-06 | global batch size: 16 | lm loss: 7.138112E+00 | loss scale: 32768.0 | grad norm: 289252.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1730/ 159576 | consumed samples: 27680 | elapsed time per iteration (ms): 14055.9 | learning rate: 7.673E-06 | global batch size: 16 | lm loss: 7.041800E+00 | loss scale: 32768.0 | grad norm: 190020.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1731/ 159576 | consumed samples: 27696 | elapsed time per iteration (ms): 13571.4 | learning rate: 7.678E-06 | global batch size: 16 | lm loss: 7.037878E+00 | loss scale: 32768.0 | grad norm: 149538.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1732/ 159576 | consumed samples: 27712 | elapsed time per iteration (ms): 13585.4 | learning rate: 7.682E-06 | global batch size: 16 | lm loss: 7.179647E+00 | loss scale: 32768.0 | grad norm: 151351.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1733/ 159576 | consumed samples: 27728 | elapsed time per iteration (ms): 13582.2 | learning rate: 7.686E-06 | global batch size: 16 | lm loss: 7.234662E+00 | loss scale: 32768.0 | grad norm: 317716.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1734/ 159576 | consumed samples: 27744 | elapsed time per iteration (ms): 14148.8 | learning rate: 7.691E-06 | global batch size: 16 | lm loss: 7.306998E+00 | loss scale: 32768.0 | grad norm: 216190.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1735/ 159576 | consumed samples: 27760 | elapsed time per iteration (ms): 13664.2 | learning rate: 7.695E-06 | global batch size: 16 | lm loss: 7.130812E+00 | loss scale: 32768.0 | grad norm: 168041.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1736/ 159576 | consumed samples: 27776 | elapsed time per iteration (ms): 13539.2 | learning rate: 7.700E-06 | global batch size: 16 | lm loss: 7.164721E+00 | loss scale: 32768.0 | grad norm: 189764.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1737/ 159576 | consumed samples: 27792 | elapsed time per iteration (ms): 13580.1 | learning rate: 7.704E-06 | global batch size: 16 | lm loss: 7.213598E+00 | loss scale: 32768.0 | grad norm: 231432.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1738/ 159576 | consumed samples: 27808 | elapsed time per iteration (ms): 13874.0 | learning rate: 7.709E-06 | global batch size: 16 | lm loss: 7.064263E+00 | loss scale: 32768.0 | grad norm: 332299.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1739/ 159576 | consumed samples: 27824 | elapsed time per iteration (ms): 13542.8 | learning rate: 7.713E-06 | global batch size: 16 | lm loss: 7.187717E+00 | loss scale: 32768.0 | grad norm: 159503.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1740/ 159576 | consumed samples: 27840 | elapsed time per iteration (ms): 13564.1 | learning rate: 7.717E-06 | global batch size: 16 | lm loss: 7.212025E+00 | loss scale: 32768.0 | grad norm: 275497.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1741/ 159576 | consumed samples: 27856 | elapsed time per iteration (ms): 13584.8 | learning rate: 7.722E-06 | global batch size: 16 | lm loss: 6.960712E+00 | loss scale: 32768.0 | grad norm: 307419.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1742/ 159576 | consumed samples: 27872 | elapsed time per iteration (ms): 13621.1 | learning rate: 7.726E-06 | global batch size: 16 | lm loss: 7.086576E+00 | loss scale: 32768.0 | grad norm: 156758.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1743/ 159576 | consumed samples: 27888 | elapsed time per iteration (ms): 13719.9 | learning rate: 7.731E-06 | global batch size: 16 | lm loss: 6.961288E+00 | loss scale: 32768.0 | grad norm: 147761.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1744/ 159576 | consumed samples: 27904 | elapsed time per iteration (ms): 13570.6 | learning rate: 7.735E-06 | global batch size: 16 | lm loss: 7.320576E+00 | loss scale: 32768.0 | grad norm: 309786.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1745/ 159576 | consumed samples: 27920 | elapsed time per iteration (ms): 13600.3 | learning rate: 7.740E-06 | global batch size: 16 | lm loss: 7.218632E+00 | loss scale: 32768.0 | grad norm: 330698.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1746/ 159576 | consumed samples: 27936 | elapsed time per iteration (ms): 13548.3 | learning rate: 7.744E-06 | global batch size: 16 | lm loss: 7.139973E+00 | loss scale: 32768.0 | grad norm: 376967.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1747/ 159576 | consumed samples: 27952 | elapsed time per iteration (ms): 13954.3 | learning rate: 7.749E-06 | global batch size: 16 | lm loss: 7.074110E+00 | loss scale: 32768.0 | grad norm: 214147.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1748/ 159576 | consumed samples: 27968 | elapsed time per iteration (ms): 13621.8 | learning rate: 7.753E-06 | global batch size: 16 | lm loss: 7.254288E+00 | loss scale: 32768.0 | grad norm: 128937.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1749/ 159576 | consumed samples: 27984 | elapsed time per iteration (ms): 13626.6 | learning rate: 7.757E-06 | global batch size: 16 | lm loss: 7.009082E+00 | loss scale: 32768.0 | grad norm: 392446.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1750/ 159576 | consumed samples: 28000 | elapsed time per iteration (ms): 13590.6 | learning rate: 7.762E-06 | global batch size: 16 | lm loss: 6.949193E+00 | loss scale: 32768.0 | grad norm: 205911.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1751/ 159576 | consumed samples: 28016 | elapsed time per iteration (ms): 13916.9 | learning rate: 7.766E-06 | global batch size: 16 | lm loss: 7.175614E+00 | loss scale: 32768.0 | grad norm: 181359.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1752/ 159576 | consumed samples: 28032 | elapsed time per iteration (ms): 13747.5 | learning rate: 7.771E-06 | global batch size: 16 | lm loss: 7.084972E+00 | loss scale: 32768.0 | grad norm: 191810.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1753/ 159576 | consumed samples: 28048 | elapsed time per iteration (ms): 13591.1 | learning rate: 7.775E-06 | global batch size: 16 | lm loss: 7.125815E+00 | loss scale: 32768.0 | grad norm: 150833.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1754/ 159576 | consumed samples: 28064 | elapsed time per iteration (ms): 13552.4 | learning rate: 7.780E-06 | global batch size: 16 | lm loss: 7.096021E+00 | loss scale: 32768.0 | grad norm: 858159.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1755/ 159576 | consumed samples: 28080 | elapsed time per iteration (ms): 13586.8 | learning rate: 7.784E-06 | global batch size: 16 | lm loss: 7.401230E+00 | loss scale: 32768.0 | grad norm: 1015122.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1756/ 159576 | consumed samples: 28096 | elapsed time per iteration (ms): 14062.7 | learning rate: 7.788E-06 | global batch size: 16 | lm loss: 7.141807E+00 | loss scale: 32768.0 | grad norm: 241473.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1757/ 159576 | consumed samples: 28112 | elapsed time per iteration (ms): 13654.9 | learning rate: 7.793E-06 | global batch size: 16 | lm loss: 7.055682E+00 | loss scale: 32768.0 | grad norm: 195258.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1758/ 159576 | consumed samples: 28128 | elapsed time per iteration (ms): 13576.6 | learning rate: 7.797E-06 | global batch size: 16 | lm loss: 6.887124E+00 | loss scale: 32768.0 | grad norm: 209948.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1759/ 159576 | consumed samples: 28144 | elapsed time per iteration (ms): 13615.8 | learning rate: 7.802E-06 | global batch size: 16 | lm loss: 7.008955E+00 | loss scale: 32768.0 | grad norm: 218109.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1760/ 159576 | consumed samples: 28160 | elapsed time per iteration (ms): 13880.5 | learning rate: 7.806E-06 | global batch size: 16 | lm loss: 7.156555E+00 | loss scale: 32768.0 | grad norm: 199049.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1761/ 159576 | consumed samples: 28176 | elapsed time per iteration (ms): 13559.3 | learning rate: 7.811E-06 | global batch size: 16 | lm loss: 7.445184E+00 | loss scale: 32768.0 | grad norm: 571721.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1762/ 159576 | consumed samples: 28192 | elapsed time per iteration (ms): 13597.9 | learning rate: 7.815E-06 | global batch size: 16 | lm loss: 7.408930E+00 | loss scale: 32768.0 | grad norm: 477324.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1763/ 159576 | consumed samples: 28208 | elapsed time per iteration (ms): 13646.1 | learning rate: 7.820E-06 | global batch size: 16 | lm loss: 7.228862E+00 | loss scale: 32768.0 | grad norm: 183806.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1764/ 159576 | consumed samples: 28224 | elapsed time per iteration (ms): 13595.0 | learning rate: 7.824E-06 | global batch size: 16 | lm loss: 7.213759E+00 | loss scale: 32768.0 | grad norm: 199120.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1765/ 159576 | consumed samples: 28240 | elapsed time per iteration (ms): 13787.5 | learning rate: 7.828E-06 | global batch size: 16 | lm loss: 7.190694E+00 | loss scale: 32768.0 | grad norm: 230903.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1766/ 159576 | consumed samples: 28256 | elapsed time per iteration (ms): 13655.5 | learning rate: 7.833E-06 | global batch size: 16 | lm loss: 7.120300E+00 | loss scale: 32768.0 | grad norm: 189933.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1767/ 159576 | consumed samples: 28272 | elapsed time per iteration (ms): 13622.2 | learning rate: 7.837E-06 | global batch size: 16 | lm loss: 7.216377E+00 | loss scale: 32768.0 | grad norm: 411494.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1768/ 159576 | consumed samples: 28288 | elapsed time per iteration (ms): 13542.4 | learning rate: 7.842E-06 | global batch size: 16 | lm loss: 7.296084E+00 | loss scale: 32768.0 | grad norm: 211545.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1769/ 159576 | consumed samples: 28304 | elapsed time per iteration (ms): 13835.8 | learning rate: 7.846E-06 | global batch size: 16 | lm loss: 7.014185E+00 | loss scale: 32768.0 | grad norm: 159627.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1770/ 159576 | consumed samples: 28320 | elapsed time per iteration (ms): 13625.8 | learning rate: 7.851E-06 | global batch size: 16 | lm loss: 7.260840E+00 | loss scale: 32768.0 | grad norm: 230760.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 09:03:50] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 09:03:50] PULSE: tr8-104B is running for 3:11:39 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 1771/ 159576 | consumed samples: 28336 | elapsed time per iteration (ms): 13609.6 | learning rate: 7.855E-06 | global batch size: 16 | lm loss: 7.096549E+00 | loss scale: 32768.0 | grad norm: 208126.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1772/ 159576 | consumed samples: 28352 | elapsed time per iteration (ms): 13612.5 | learning rate: 7.859E-06 | global batch size: 16 | lm loss: 7.288601E+00 | loss scale: 32768.0 | grad norm: 299861.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1773/ 159576 | consumed samples: 28368 | elapsed time per iteration (ms): 14036.1 | learning rate: 7.864E-06 | global batch size: 16 | lm loss: 7.006525E+00 | loss scale: 32768.0 | grad norm: 221185.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1774/ 159576 | consumed samples: 28384 | elapsed time per iteration (ms): 13455.1 | learning rate: 7.868E-06 | global batch size: 16 | lm loss: 7.057816E+00 | loss scale: 32768.0 | grad norm: 211669.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1775/ 159576 | consumed samples: 28400 | elapsed time per iteration (ms): 13580.5 | learning rate: 7.873E-06 | global batch size: 16 | lm loss: 7.225205E+00 | loss scale: 32768.0 | grad norm: 232985.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1776/ 159576 | consumed samples: 28416 | elapsed time per iteration (ms): 13577.7 | learning rate: 7.877E-06 | global batch size: 16 | lm loss: 7.090505E+00 | loss scale: 32768.0 | grad norm: 148862.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1777/ 159576 | consumed samples: 28432 | elapsed time per iteration (ms): 13633.9 | learning rate: 7.882E-06 | global batch size: 16 | lm loss: 7.291343E+00 | loss scale: 32768.0 | grad norm: 241931.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1778/ 159576 | consumed samples: 28448 | elapsed time per iteration (ms): 13810.9 | learning rate: 7.886E-06 | global batch size: 16 | lm loss: 7.168088E+00 | loss scale: 32768.0 | grad norm: 186155.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1779/ 159576 | consumed samples: 28464 | elapsed time per iteration (ms): 13677.6 | learning rate: 7.891E-06 | global batch size: 16 | lm loss: 6.975587E+00 | loss scale: 32768.0 | grad norm: 141385.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1780/ 159576 | consumed samples: 28480 | elapsed time per iteration (ms): 13699.5 | learning rate: 7.895E-06 | global batch size: 16 | lm loss: 7.234455E+00 | loss scale: 32768.0 | grad norm: 167275.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1781/ 159576 | consumed samples: 28496 | elapsed time per iteration (ms): 13560.1 | learning rate: 7.899E-06 | global batch size: 16 | lm loss: 7.118816E+00 | loss scale: 32768.0 | grad norm: 185745.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1782/ 159576 | consumed samples: 28512 | elapsed time per iteration (ms): 14007.0 | learning rate: 7.904E-06 | global batch size: 16 | lm loss: 7.325441E+00 | loss scale: 32768.0 | grad norm: 151237.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1783/ 159576 | consumed samples: 28528 | elapsed time per iteration (ms): 13468.4 | learning rate: 7.908E-06 | global batch size: 16 | lm loss: 6.976577E+00 | loss scale: 32768.0 | grad norm: 157950.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1784/ 159576 | consumed samples: 28544 | elapsed time per iteration (ms): 13610.8 | learning rate: 7.913E-06 | global batch size: 16 | lm loss: 7.151215E+00 | loss scale: 32768.0 | grad norm: 185745.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1785/ 159576 | consumed samples: 28560 | elapsed time per iteration (ms): 13574.9 | learning rate: 7.917E-06 | global batch size: 16 | lm loss: 6.982706E+00 | loss scale: 32768.0 | grad norm: 212394.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1786/ 159576 | consumed samples: 28576 | elapsed time per iteration (ms): 13593.1 | learning rate: 7.922E-06 | global batch size: 16 | lm loss: 7.090255E+00 | loss scale: 32768.0 | grad norm: 165476.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1787/ 159576 | consumed samples: 28592 | elapsed time per iteration (ms): 13825.7 | learning rate: 7.926E-06 | global batch size: 16 | lm loss: 7.190539E+00 | loss scale: 32768.0 | grad norm: 105058.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1788/ 159576 | consumed samples: 28608 | elapsed time per iteration (ms): 13613.9 | learning rate: 7.930E-06 | global batch size: 16 | lm loss: 6.849520E+00 | loss scale: 32768.0 | grad norm: 180790.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1789/ 159576 | consumed samples: 28624 | elapsed time per iteration (ms): 13633.8 | learning rate: 7.935E-06 | global batch size: 16 | lm loss: 7.203046E+00 | loss scale: 32768.0 | grad norm: 126112.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1790/ 159576 | consumed samples: 28640 | elapsed time per iteration (ms): 13618.2 | learning rate: 7.939E-06 | global batch size: 16 | lm loss: 7.073618E+00 | loss scale: 32768.0 | grad norm: 138120.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1791/ 159576 | consumed samples: 28656 | elapsed time per iteration (ms): 14044.8 | learning rate: 7.944E-06 | global batch size: 16 | lm loss: 7.193256E+00 | loss scale: 32768.0 | grad norm: 127392.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1792/ 159576 | consumed samples: 28672 | elapsed time per iteration (ms): 13675.9 | learning rate: 7.948E-06 | global batch size: 16 | lm loss: 7.182660E+00 | loss scale: 32768.0 | grad norm: 128828.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1793/ 159576 | consumed samples: 28688 | elapsed time per iteration (ms): 13639.0 | learning rate: 7.953E-06 | global batch size: 16 | lm loss: 7.029709E+00 | loss scale: 32768.0 | grad norm: 123453.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1794/ 159576 | consumed samples: 28704 | elapsed time per iteration (ms): 13728.8 | learning rate: 7.957E-06 | global batch size: 16 | lm loss: 7.166730E+00 | loss scale: 32768.0 | grad norm: 117050.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1795/ 159576 | consumed samples: 28720 | elapsed time per iteration (ms): 13951.0 | learning rate: 7.962E-06 | global batch size: 16 | lm loss: 7.100776E+00 | loss scale: 32768.0 | grad norm: 166379.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1796/ 159576 | consumed samples: 28736 | elapsed time per iteration (ms): 13626.1 | learning rate: 7.966E-06 | global batch size: 16 | lm loss: 7.059687E+00 | loss scale: 32768.0 | grad norm: 165877.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1797/ 159576 | consumed samples: 28752 | elapsed time per iteration (ms): 13658.2 | learning rate: 7.970E-06 | global batch size: 16 | lm loss: 7.128800E+00 | loss scale: 32768.0 | grad norm: 241870.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1798/ 159576 | consumed samples: 28768 | elapsed time per iteration (ms): 13547.6 | learning rate: 7.975E-06 | global batch size: 16 | lm loss: 6.884446E+00 | loss scale: 32768.0 | grad norm: 129845.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1799/ 159576 | consumed samples: 28784 | elapsed time per iteration (ms): 13614.6 | learning rate: 7.979E-06 | global batch size: 16 | lm loss: 7.309677E+00 | loss scale: 32768.0 | grad norm: 156206.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1800/ 159576 | consumed samples: 28800 | elapsed time per iteration (ms): 13719.1 | learning rate: 7.984E-06 | global batch size: 16 | lm loss: 6.891129E+00 | loss scale: 32768.0 | grad norm: 130612.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1801/ 159576 | consumed samples: 28816 | elapsed time per iteration (ms): 13709.3 | learning rate: 7.988E-06 | global batch size: 16 | lm loss: 7.259354E+00 | loss scale: 32768.0 | grad norm: 299631.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1802/ 159576 | consumed samples: 28832 | elapsed time per iteration (ms): 13702.3 | learning rate: 7.993E-06 | global batch size: 16 | lm loss: 7.091782E+00 | loss scale: 32768.0 | grad norm: 164547.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1803/ 159576 | consumed samples: 28848 | elapsed time per iteration (ms): 13667.9 | learning rate: 7.997E-06 | global batch size: 16 | lm loss: 7.081347E+00 | loss scale: 32768.0 | grad norm: 157884.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1804/ 159576 | consumed samples: 28864 | elapsed time per iteration (ms): 14087.7 | learning rate: 8.001E-06 | global batch size: 16 | lm loss: 7.043708E+00 | loss scale: 32768.0 | grad norm: 179047.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1805/ 159576 | consumed samples: 28880 | elapsed time per iteration (ms): 13636.0 | learning rate: 8.006E-06 | global batch size: 16 | lm loss: 7.153672E+00 | loss scale: 32768.0 | grad norm: 171473.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1806/ 159576 | consumed samples: 28896 | elapsed time per iteration (ms): 13563.1 | learning rate: 8.010E-06 | global batch size: 16 | lm loss: 7.067021E+00 | loss scale: 32768.0 | grad norm: 114434.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1807/ 159576 | consumed samples: 28912 | elapsed time per iteration (ms): 13653.6 | learning rate: 8.015E-06 | global batch size: 16 | lm loss: 7.234491E+00 | loss scale: 32768.0 | grad norm: 149275.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1808/ 159576 | consumed samples: 28928 | elapsed time per iteration (ms): 13997.0 | learning rate: 8.019E-06 | global batch size: 16 | lm loss: 7.015783E+00 | loss scale: 32768.0 | grad norm: 179254.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1809/ 159576 | consumed samples: 28944 | elapsed time per iteration (ms): 13813.5 | learning rate: 8.024E-06 | global batch size: 16 | lm loss: 7.176732E+00 | loss scale: 32768.0 | grad norm: 180477.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1810/ 159576 | consumed samples: 28960 | elapsed time per iteration (ms): 13672.4 | learning rate: 8.028E-06 | global batch size: 16 | lm loss: 6.590204E+00 | loss scale: 32768.0 | grad norm: 149127.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1811/ 159576 | consumed samples: 28976 | elapsed time per iteration (ms): 13741.3 | learning rate: 8.033E-06 | global batch size: 16 | lm loss: 7.100949E+00 | loss scale: 32768.0 | grad norm: 133004.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1812/ 159576 | consumed samples: 28992 | elapsed time per iteration (ms): 13598.0 | learning rate: 8.037E-06 | global batch size: 16 | lm loss: 7.268322E+00 | loss scale: 32768.0 | grad norm: 287887.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1813/ 159576 | consumed samples: 29008 | elapsed time per iteration (ms): 13826.0 | learning rate: 8.041E-06 | global batch size: 16 | lm loss: 7.048282E+00 | loss scale: 32768.0 | grad norm: 147045.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1814/ 159576 | consumed samples: 29024 | elapsed time per iteration (ms): 13651.5 | learning rate: 8.046E-06 | global batch size: 16 | lm loss: 7.168237E+00 | loss scale: 32768.0 | grad norm: 167345.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1815/ 159576 | consumed samples: 29040 | elapsed time per iteration (ms): 13646.2 | learning rate: 8.050E-06 | global batch size: 16 | lm loss: 6.976926E+00 | loss scale: 32768.0 | grad norm: 173193.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1816/ 159576 | consumed samples: 29056 | elapsed time per iteration (ms): 13708.4 | learning rate: 8.055E-06 | global batch size: 16 | lm loss: 7.173286E+00 | loss scale: 32768.0 | grad norm: 156812.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1817/ 159576 | consumed samples: 29072 | elapsed time per iteration (ms): 14056.6 | learning rate: 8.059E-06 | global batch size: 16 | lm loss: 7.191895E+00 | loss scale: 32768.0 | grad norm: 254989.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1818/ 159576 | consumed samples: 29088 | elapsed time per iteration (ms): 13727.1 | learning rate: 8.064E-06 | global batch size: 16 | lm loss: 7.070405E+00 | loss scale: 32768.0 | grad norm: 128138.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1819/ 159576 | consumed samples: 29104 | elapsed time per iteration (ms): 13606.2 | learning rate: 8.068E-06 | global batch size: 16 | lm loss: 6.955974E+00 | loss scale: 32768.0 | grad norm: 140247.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1820/ 159576 | consumed samples: 29120 | elapsed time per iteration (ms): 13652.5 | learning rate: 8.072E-06 | global batch size: 16 | lm loss: 7.029711E+00 | loss scale: 32768.0 | grad norm: 153040.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1821/ 159576 | consumed samples: 29136 | elapsed time per iteration (ms): 13671.5 | learning rate: 8.077E-06 | global batch size: 16 | lm loss: 7.097312E+00 | loss scale: 32768.0 | grad norm: 168364.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1822/ 159576 | consumed samples: 29152 | elapsed time per iteration (ms): 13964.1 | learning rate: 8.081E-06 | global batch size: 16 | lm loss: 7.163728E+00 | loss scale: 32768.0 | grad norm: 143592.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1823/ 159576 | consumed samples: 29168 | elapsed time per iteration (ms): 13677.5 | learning rate: 8.086E-06 | global batch size: 16 | lm loss: 7.161910E+00 | loss scale: 32768.0 | grad norm: 232336.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1824/ 159576 | consumed samples: 29184 | elapsed time per iteration (ms): 13682.4 | learning rate: 8.090E-06 | global batch size: 16 | lm loss: 7.241871E+00 | loss scale: 32768.0 | grad norm: 136988.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1825/ 159576 | consumed samples: 29200 | elapsed time per iteration (ms): 13681.2 | learning rate: 8.095E-06 | global batch size: 16 | lm loss: 6.885506E+00 | loss scale: 32768.0 | grad norm: 147212.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1826/ 159576 | consumed samples: 29216 | elapsed time per iteration (ms): 14107.7 | learning rate: 8.099E-06 | global batch size: 16 | lm loss: 7.094235E+00 | loss scale: 32768.0 | grad norm: 210358.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1827/ 159576 | consumed samples: 29232 | elapsed time per iteration (ms): 13698.2 | learning rate: 8.104E-06 | global batch size: 16 | lm loss: 6.987474E+00 | loss scale: 32768.0 | grad norm: 200444.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1828/ 159576 | consumed samples: 29248 | elapsed time per iteration (ms): 13646.3 | learning rate: 8.108E-06 | global batch size: 16 | lm loss: 7.024292E+00 | loss scale: 32768.0 | grad norm: 144708.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1829/ 159576 | consumed samples: 29264 | elapsed time per iteration (ms): 13672.0 | learning rate: 8.112E-06 | global batch size: 16 | lm loss: 7.101940E+00 | loss scale: 32768.0 | grad norm: 137983.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1830/ 159576 | consumed samples: 29280 | elapsed time per iteration (ms): 13973.1 | learning rate: 8.117E-06 | global batch size: 16 | lm loss: 6.950300E+00 | loss scale: 32768.0 | grad norm: 228570.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1831/ 159576 | consumed samples: 29296 | elapsed time per iteration (ms): 13712.1 | learning rate: 8.121E-06 | global batch size: 16 | lm loss: 7.000825E+00 | loss scale: 32768.0 | grad norm: 204009.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1832/ 159576 | consumed samples: 29312 | elapsed time per iteration (ms): 13734.6 | learning rate: 8.126E-06 | global batch size: 16 | lm loss: 7.021888E+00 | loss scale: 32768.0 | grad norm: 168698.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1833/ 159576 | consumed samples: 29328 | elapsed time per iteration (ms): 13643.1 | learning rate: 8.130E-06 | global batch size: 16 | lm loss: 6.956877E+00 | loss scale: 32768.0 | grad norm: 139702.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1834/ 159576 | consumed samples: 29344 | elapsed time per iteration (ms): 13670.0 | learning rate: 8.135E-06 | global batch size: 16 | lm loss: 7.078534E+00 | loss scale: 32768.0 | grad norm: 220188.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1835/ 159576 | consumed samples: 29360 | elapsed time per iteration (ms): 13786.5 | learning rate: 8.139E-06 | global batch size: 16 | lm loss: 7.145173E+00 | loss scale: 32768.0 | grad norm: 181620.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1836/ 159576 | consumed samples: 29376 | elapsed time per iteration (ms): 13684.7 | learning rate: 8.143E-06 | global batch size: 16 | lm loss: 7.147571E+00 | loss scale: 32768.0 | grad norm: 148241.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1837/ 159576 | consumed samples: 29392 | elapsed time per iteration (ms): 13650.8 | learning rate: 8.148E-06 | global batch size: 16 | lm loss: 7.198610E+00 | loss scale: 32768.0 | grad norm: 129198.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1838/ 159576 | consumed samples: 29408 | elapsed time per iteration (ms): 13689.6 | learning rate: 8.152E-06 | global batch size: 16 | lm loss: 7.077027E+00 | loss scale: 32768.0 | grad norm: 179805.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1839/ 159576 | consumed samples: 29424 | elapsed time per iteration (ms): 14193.0 | learning rate: 8.157E-06 | global batch size: 16 | lm loss: 7.034157E+00 | loss scale: 32768.0 | grad norm: 179474.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1840/ 159576 | consumed samples: 29440 | elapsed time per iteration (ms): 13593.3 | learning rate: 8.161E-06 | global batch size: 16 | lm loss: 7.132106E+00 | loss scale: 32768.0 | grad norm: 138966.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1841/ 159576 | consumed samples: 29456 | elapsed time per iteration (ms): 13717.8 | learning rate: 8.166E-06 | global batch size: 16 | lm loss: 7.290091E+00 | loss scale: 32768.0 | grad norm: 176321.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1842/ 159576 | consumed samples: 29472 | elapsed time per iteration (ms): 13672.3 | learning rate: 8.170E-06 | global batch size: 16 | lm loss: 7.222583E+00 | loss scale: 32768.0 | grad norm: 157190.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1843/ 159576 | consumed samples: 29488 | elapsed time per iteration (ms): 14041.0 | learning rate: 8.175E-06 | global batch size: 16 | lm loss: 7.080160E+00 | loss scale: 32768.0 | grad norm: 209951.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1844/ 159576 | consumed samples: 29504 | elapsed time per iteration (ms): 13687.6 | learning rate: 8.179E-06 | global batch size: 16 | lm loss: 7.044501E+00 | loss scale: 32768.0 | grad norm: 148871.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1845/ 159576 | consumed samples: 29520 | elapsed time per iteration (ms): 13645.6 | learning rate: 8.183E-06 | global batch size: 16 | lm loss: 7.157808E+00 | loss scale: 32768.0 | grad norm: 274735.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1846/ 159576 | consumed samples: 29536 | elapsed time per iteration (ms): 13730.4 | learning rate: 8.188E-06 | global batch size: 16 | lm loss: 6.885038E+00 | loss scale: 32768.0 | grad norm: 152141.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1847/ 159576 | consumed samples: 29552 | elapsed time per iteration (ms): 13619.7 | learning rate: 8.192E-06 | global batch size: 16 | lm loss: 7.235194E+00 | loss scale: 32768.0 | grad norm: 176093.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1848/ 159576 | consumed samples: 29568 | elapsed time per iteration (ms): 13886.2 | learning rate: 8.197E-06 | global batch size: 16 | lm loss: 7.254928E+00 | loss scale: 32768.0 | grad norm: 205754.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1849/ 159576 | consumed samples: 29584 | elapsed time per iteration (ms): 13743.9 | learning rate: 8.201E-06 | global batch size: 16 | lm loss: 7.040710E+00 | loss scale: 32768.0 | grad norm: 218799.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1850/ 159576 | consumed samples: 29600 | elapsed time per iteration (ms): 13589.2 | learning rate: 8.206E-06 | global batch size: 16 | lm loss: 7.048983E+00 | loss scale: 32768.0 | grad norm: 207680.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1851/ 159576 | consumed samples: 29616 | elapsed time per iteration (ms): 13643.5 | learning rate: 8.210E-06 | global batch size: 16 | lm loss: 7.264068E+00 | loss scale: 32768.0 | grad norm: 172145.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1852/ 159576 | consumed samples: 29632 | elapsed time per iteration (ms): 14007.8 | learning rate: 8.214E-06 | global batch size: 16 | lm loss: 7.091225E+00 | loss scale: 32768.0 | grad norm: 165885.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1853/ 159576 | consumed samples: 29648 | elapsed time per iteration (ms): 13621.7 | learning rate: 8.219E-06 | global batch size: 16 | lm loss: 7.004953E+00 | loss scale: 32768.0 | grad norm: 193763.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1854/ 159576 | consumed samples: 29664 | elapsed time per iteration (ms): 13705.7 | learning rate: 8.223E-06 | global batch size: 16 | lm loss: 7.337306E+00 | loss scale: 32768.0 | grad norm: 334165.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1855/ 159576 | consumed samples: 29680 | elapsed time per iteration (ms): 13688.7 | learning rate: 8.228E-06 | global batch size: 16 | lm loss: 7.088278E+00 | loss scale: 32768.0 | grad norm: 168305.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1856/ 159576 | consumed samples: 29696 | elapsed time per iteration (ms): 14064.4 | learning rate: 8.232E-06 | global batch size: 16 | lm loss: 7.075657E+00 | loss scale: 32768.0 | grad norm: 146104.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1857/ 159576 | consumed samples: 29712 | elapsed time per iteration (ms): 13622.8 | learning rate: 8.237E-06 | global batch size: 16 | lm loss: 7.326543E+00 | loss scale: 32768.0 | grad norm: 226986.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1858/ 159576 | consumed samples: 29728 | elapsed time per iteration (ms): 13661.1 | learning rate: 8.241E-06 | global batch size: 16 | lm loss: 7.226311E+00 | loss scale: 32768.0 | grad norm: 127252.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1859/ 159576 | consumed samples: 29744 | elapsed time per iteration (ms): 13672.4 | learning rate: 8.246E-06 | global batch size: 16 | lm loss: 7.024733E+00 | loss scale: 32768.0 | grad norm: 195136.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1860/ 159576 | consumed samples: 29760 | elapsed time per iteration (ms): 13685.6 | learning rate: 8.250E-06 | global batch size: 16 | lm loss: 7.050764E+00 | loss scale: 32768.0 | grad norm: 137697.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1861/ 159576 | consumed samples: 29776 | elapsed time per iteration (ms): 13956.5 | learning rate: 8.254E-06 | global batch size: 16 | lm loss: 7.164598E+00 | loss scale: 32768.0 | grad norm: 186285.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1862/ 159576 | consumed samples: 29792 | elapsed time per iteration (ms): 13801.6 | learning rate: 8.259E-06 | global batch size: 16 | lm loss: 6.982927E+00 | loss scale: 32768.0 | grad norm: 155576.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1863/ 159576 | consumed samples: 29808 | elapsed time per iteration (ms): 13779.0 | learning rate: 8.263E-06 | global batch size: 16 | lm loss: 6.845668E+00 | loss scale: 32768.0 | grad norm: 211290.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1864/ 159576 | consumed samples: 29824 | elapsed time per iteration (ms): 13629.6 | learning rate: 8.268E-06 | global batch size: 16 | lm loss: 7.561100E+00 | loss scale: 32768.0 | grad norm: 177907.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1865/ 159576 | consumed samples: 29840 | elapsed time per iteration (ms): 14024.6 | learning rate: 8.272E-06 | global batch size: 16 | lm loss: 7.056180E+00 | loss scale: 32768.0 | grad norm: 132307.729 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1866/ 159576 | consumed samples: 29856 | elapsed time per iteration (ms): 13629.1 | learning rate: 8.277E-06 | global batch size: 16 | lm loss: 7.005206E+00 | loss scale: 32768.0 | grad norm: 140727.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1867/ 159576 | consumed samples: 29872 | elapsed time per iteration (ms): 13680.5 | learning rate: 8.281E-06 | global batch size: 16 | lm loss: 7.008940E+00 | loss scale: 32768.0 | grad norm: 149676.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1868/ 159576 | consumed samples: 29888 | elapsed time per iteration (ms): 13661.9 | learning rate: 8.286E-06 | global batch size: 16 | lm loss: 7.154263E+00 | loss scale: 32768.0 | grad norm: 181537.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1869/ 159576 | consumed samples: 29904 | elapsed time per iteration (ms): 13705.9 | learning rate: 8.290E-06 | global batch size: 16 | lm loss: 7.144859E+00 | loss scale: 32768.0 | grad norm: 156740.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1870/ 159576 | consumed samples: 29920 | elapsed time per iteration (ms): 13994.0 | learning rate: 8.294E-06 | global batch size: 16 | lm loss: 7.053184E+00 | loss scale: 32768.0 | grad norm: 209836.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1871/ 159576 | consumed samples: 29936 | elapsed time per iteration (ms): 13623.9 | learning rate: 8.299E-06 | global batch size: 16 | lm loss: 7.033763E+00 | loss scale: 32768.0 | grad norm: 173327.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1872/ 159576 | consumed samples: 29952 | elapsed time per iteration (ms): 13679.1 | learning rate: 8.303E-06 | global batch size: 16 | lm loss: 6.990786E+00 | loss scale: 32768.0 | grad norm: 281336.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1873/ 159576 | consumed samples: 29968 | elapsed time per iteration (ms): 13694.2 | learning rate: 8.308E-06 | global batch size: 16 | lm loss: 7.073781E+00 | loss scale: 32768.0 | grad norm: 124900.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1874/ 159576 | consumed samples: 29984 | elapsed time per iteration (ms): 13905.9 | learning rate: 8.312E-06 | global batch size: 16 | lm loss: 7.112270E+00 | loss scale: 32768.0 | grad norm: 168221.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1875/ 159576 | consumed samples: 30000 | elapsed time per iteration (ms): 13703.7 | learning rate: 8.317E-06 | global batch size: 16 | lm loss: 7.233196E+00 | loss scale: 32768.0 | grad norm: 174650.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1876/ 159576 | consumed samples: 30016 | elapsed time per iteration (ms): 13702.9 | learning rate: 8.321E-06 | global batch size: 16 | lm loss: 6.967190E+00 | loss scale: 32768.0 | grad norm: 177533.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1877/ 159576 | consumed samples: 30032 | elapsed time per iteration (ms): 13717.8 | learning rate: 8.325E-06 | global batch size: 16 | lm loss: 7.208225E+00 | loss scale: 32768.0 | grad norm: 207887.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1878/ 159576 | consumed samples: 30048 | elapsed time per iteration (ms): 14066.9 | learning rate: 8.330E-06 | global batch size: 16 | lm loss: 7.077339E+00 | loss scale: 32768.0 | grad norm: 142338.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1879/ 159576 | consumed samples: 30064 | elapsed time per iteration (ms): 13776.6 | learning rate: 8.334E-06 | global batch size: 16 | lm loss: 7.113251E+00 | loss scale: 32768.0 | grad norm: 158300.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1880/ 159576 | consumed samples: 30080 | elapsed time per iteration (ms): 13663.2 | learning rate: 8.339E-06 | global batch size: 16 | lm loss: 6.912469E+00 | loss scale: 32768.0 | grad norm: 145353.873 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1881/ 159576 | consumed samples: 30096 | elapsed time per iteration (ms): 13679.1 | learning rate: 8.343E-06 | global batch size: 16 | lm loss: 7.055939E+00 | loss scale: 32768.0 | grad norm: 337973.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1882/ 159576 | consumed samples: 30112 | elapsed time per iteration (ms): 13654.4 | learning rate: 8.348E-06 | global batch size: 16 | lm loss: 6.903512E+00 | loss scale: 32768.0 | grad norm: 240165.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1883/ 159576 | consumed samples: 30128 | elapsed time per iteration (ms): 13896.8 | learning rate: 8.352E-06 | global batch size: 16 | lm loss: 7.154733E+00 | loss scale: 32768.0 | grad norm: 145006.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1884/ 159576 | consumed samples: 30144 | elapsed time per iteration (ms): 13729.5 | learning rate: 8.357E-06 | global batch size: 16 | lm loss: 7.018287E+00 | loss scale: 32768.0 | grad norm: 447058.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1885/ 159576 | consumed samples: 30160 | elapsed time per iteration (ms): 13624.7 | learning rate: 8.361E-06 | global batch size: 16 | lm loss: 7.306771E+00 | loss scale: 32768.0 | grad norm: 269279.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1886/ 159576 | consumed samples: 30176 | elapsed time per iteration (ms): 13710.2 | learning rate: 8.365E-06 | global batch size: 16 | lm loss: 7.124641E+00 | loss scale: 32768.0 | grad norm: 184189.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1887/ 159576 | consumed samples: 30192 | elapsed time per iteration (ms): 14269.7 | learning rate: 8.370E-06 | global batch size: 16 | lm loss: 7.147641E+00 | loss scale: 32768.0 | grad norm: 240777.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1888/ 159576 | consumed samples: 30208 | elapsed time per iteration (ms): 13668.8 | learning rate: 8.374E-06 | global batch size: 16 | lm loss: 7.246544E+00 | loss scale: 32768.0 | grad norm: 221768.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1889/ 159576 | consumed samples: 30224 | elapsed time per iteration (ms): 13682.0 | learning rate: 8.379E-06 | global batch size: 16 | lm loss: 7.042133E+00 | loss scale: 32768.0 | grad norm: 453492.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1890/ 159576 | consumed samples: 30240 | elapsed time per iteration (ms): 13683.0 | learning rate: 8.383E-06 | global batch size: 16 | lm loss: 7.161106E+00 | loss scale: 32768.0 | grad norm: 191134.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1891/ 159576 | consumed samples: 30256 | elapsed time per iteration (ms): 14045.3 | learning rate: 8.388E-06 | global batch size: 16 | lm loss: 7.080533E+00 | loss scale: 32768.0 | grad norm: 226207.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1892/ 159576 | consumed samples: 30272 | elapsed time per iteration (ms): 13740.4 | learning rate: 8.392E-06 | global batch size: 16 | lm loss: 6.948812E+00 | loss scale: 32768.0 | grad norm: 198329.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1893/ 159576 | consumed samples: 30288 | elapsed time per iteration (ms): 13747.4 | learning rate: 8.396E-06 | global batch size: 16 | lm loss: 7.024124E+00 | loss scale: 32768.0 | grad norm: 332574.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1894/ 159576 | consumed samples: 30304 | elapsed time per iteration (ms): 13742.5 | learning rate: 8.401E-06 | global batch size: 16 | lm loss: 7.072248E+00 | loss scale: 32768.0 | grad norm: 351090.950 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1895/ 159576 | consumed samples: 30320 | elapsed time per iteration (ms): 13599.9 | learning rate: 8.405E-06 | global batch size: 16 | lm loss: 6.964484E+00 | loss scale: 32768.0 | grad norm: 180676.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1896/ 159576 | consumed samples: 30336 | elapsed time per iteration (ms): 13892.1 | learning rate: 8.410E-06 | global batch size: 16 | lm loss: 7.066601E+00 | loss scale: 32768.0 | grad norm: 186229.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1897/ 159576 | consumed samples: 30352 | elapsed time per iteration (ms): 13686.6 | learning rate: 8.414E-06 | global batch size: 16 | lm loss: 6.975677E+00 | loss scale: 32768.0 | grad norm: 145844.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1898/ 159576 | consumed samples: 30368 | elapsed time per iteration (ms): 13668.1 | learning rate: 8.419E-06 | global batch size: 16 | lm loss: 7.225606E+00 | loss scale: 32768.0 | grad norm: 229819.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1899/ 159576 | consumed samples: 30384 | elapsed time per iteration (ms): 13600.0 | learning rate: 8.423E-06 | global batch size: 16 | lm loss: 7.082514E+00 | loss scale: 32768.0 | grad norm: 185081.109 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1900/ 159576 | consumed samples: 30400 | elapsed time per iteration (ms): 14001.2 | learning rate: 8.428E-06 | global batch size: 16 | lm loss: 7.021253E+00 | loss scale: 32768.0 | grad norm: 220377.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1901/ 159576 | consumed samples: 30416 | elapsed time per iteration (ms): 13722.2 | learning rate: 8.432E-06 | global batch size: 16 | lm loss: 7.049896E+00 | loss scale: 32768.0 | grad norm: 166889.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1902/ 159576 | consumed samples: 30432 | elapsed time per iteration (ms): 13621.3 | learning rate: 8.436E-06 | global batch size: 16 | lm loss: 6.878879E+00 | loss scale: 32768.0 | grad norm: 145213.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1903/ 159576 | consumed samples: 30448 | elapsed time per iteration (ms): 13693.3 | learning rate: 8.441E-06 | global batch size: 16 | lm loss: 6.981446E+00 | loss scale: 32768.0 | grad norm: 385714.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1904/ 159576 | consumed samples: 30464 | elapsed time per iteration (ms): 13924.8 | learning rate: 8.445E-06 | global batch size: 16 | lm loss: 7.065192E+00 | loss scale: 32768.0 | grad norm: 230309.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1905/ 159576 | consumed samples: 30480 | elapsed time per iteration (ms): 13762.9 | learning rate: 8.450E-06 | global batch size: 16 | lm loss: 7.016763E+00 | loss scale: 32768.0 | grad norm: 164701.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1906/ 159576 | consumed samples: 30496 | elapsed time per iteration (ms): 13644.6 | learning rate: 8.454E-06 | global batch size: 16 | lm loss: 6.935023E+00 | loss scale: 32768.0 | grad norm: 158636.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1907/ 159576 | consumed samples: 30512 | elapsed time per iteration (ms): 13659.2 | learning rate: 8.459E-06 | global batch size: 16 | lm loss: 7.008549E+00 | loss scale: 32768.0 | grad norm: 216415.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1908/ 159576 | consumed samples: 30528 | elapsed time per iteration (ms): 13777.8 | learning rate: 8.463E-06 | global batch size: 16 | lm loss: 7.210999E+00 | loss scale: 32768.0 | grad norm: 201609.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1909/ 159576 | consumed samples: 30544 | elapsed time per iteration (ms): 13647.1 | learning rate: 8.467E-06 | global batch size: 16 | lm loss: 7.035434E+00 | loss scale: 32768.0 | grad norm: 157381.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1910/ 159576 | consumed samples: 30560 | elapsed time per iteration (ms): 13657.7 | learning rate: 8.472E-06 | global batch size: 16 | lm loss: 7.002993E+00 | loss scale: 32768.0 | grad norm: 137094.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1911/ 159576 | consumed samples: 30576 | elapsed time per iteration (ms): 13538.8 | learning rate: 8.476E-06 | global batch size: 16 | lm loss: 6.895042E+00 | loss scale: 32768.0 | grad norm: 201565.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1912/ 159576 | consumed samples: 30592 | elapsed time per iteration (ms): 13570.4 | learning rate: 8.481E-06 | global batch size: 16 | lm loss: 7.119932E+00 | loss scale: 32768.0 | grad norm: 191020.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1913/ 159576 | consumed samples: 30608 | elapsed time per iteration (ms): 13960.8 | learning rate: 8.485E-06 | global batch size: 16 | lm loss: 7.021863E+00 | loss scale: 32768.0 | grad norm: 163947.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1914/ 159576 | consumed samples: 30624 | elapsed time per iteration (ms): 13571.3 | learning rate: 8.490E-06 | global batch size: 16 | lm loss: 7.255896E+00 | loss scale: 32768.0 | grad norm: 110811.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1915/ 159576 | consumed samples: 30640 | elapsed time per iteration (ms): 13592.9 | learning rate: 8.494E-06 | global batch size: 16 | lm loss: 7.058972E+00 | loss scale: 32768.0 | grad norm: 226666.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1916/ 159576 | consumed samples: 30656 | elapsed time per iteration (ms): 13559.3 | learning rate: 8.499E-06 | global batch size: 16 | lm loss: 7.001413E+00 | loss scale: 32768.0 | grad norm: 155562.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1917/ 159576 | consumed samples: 30672 | elapsed time per iteration (ms): 13603.1 | learning rate: 8.503E-06 | global batch size: 16 | lm loss: 6.925358E+00 | loss scale: 32768.0 | grad norm: 153599.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1918/ 159576 | consumed samples: 30688 | elapsed time per iteration (ms): 13848.6 | learning rate: 8.507E-06 | global batch size: 16 | lm loss: 7.013722E+00 | loss scale: 32768.0 | grad norm: 151847.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1919/ 159576 | consumed samples: 30704 | elapsed time per iteration (ms): 13580.7 | learning rate: 8.512E-06 | global batch size: 16 | lm loss: 7.057837E+00 | loss scale: 32768.0 | grad norm: 149268.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1920/ 159576 | consumed samples: 30720 | elapsed time per iteration (ms): 13579.6 | learning rate: 8.516E-06 | global batch size: 16 | lm loss: 7.059657E+00 | loss scale: 32768.0 | grad norm: 211843.149 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1921/ 159576 | consumed samples: 30736 | elapsed time per iteration (ms): 13716.2 | learning rate: 8.521E-06 | global batch size: 16 | lm loss: 7.145122E+00 | loss scale: 32768.0 | grad norm: 158831.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1922/ 159576 | consumed samples: 30752 | elapsed time per iteration (ms): 14204.8 | learning rate: 8.525E-06 | global batch size: 16 | lm loss: 7.012016E+00 | loss scale: 32768.0 | grad norm: 142219.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1923/ 159576 | consumed samples: 30768 | elapsed time per iteration (ms): 13586.3 | learning rate: 8.530E-06 | global batch size: 16 | lm loss: 6.958722E+00 | loss scale: 32768.0 | grad norm: 147958.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1924/ 159576 | consumed samples: 30784 | elapsed time per iteration (ms): 13654.4 | learning rate: 8.534E-06 | global batch size: 16 | lm loss: 6.916204E+00 | loss scale: 32768.0 | grad norm: 168316.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1925/ 159576 | consumed samples: 30800 | elapsed time per iteration (ms): 13581.4 | learning rate: 8.538E-06 | global batch size: 16 | lm loss: 7.208139E+00 | loss scale: 32768.0 | grad norm: 186895.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1926/ 159576 | consumed samples: 30816 | elapsed time per iteration (ms): 14057.7 | learning rate: 8.543E-06 | global batch size: 16 | lm loss: 6.921901E+00 | loss scale: 32768.0 | grad norm: 136886.936 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1927/ 159576 | consumed samples: 30832 | elapsed time per iteration (ms): 13553.3 | learning rate: 8.547E-06 | global batch size: 16 | lm loss: 7.044703E+00 | loss scale: 32768.0 | grad norm: 318519.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1928/ 159576 | consumed samples: 30848 | elapsed time per iteration (ms): 13594.1 | learning rate: 8.552E-06 | global batch size: 16 | lm loss: 6.906800E+00 | loss scale: 32768.0 | grad norm: 155021.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1929/ 159576 | consumed samples: 30864 | elapsed time per iteration (ms): 13607.1 | learning rate: 8.556E-06 | global batch size: 16 | lm loss: 6.881465E+00 | loss scale: 32768.0 | grad norm: 190717.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1930/ 159576 | consumed samples: 30880 | elapsed time per iteration (ms): 13551.6 | learning rate: 8.561E-06 | global batch size: 16 | lm loss: 7.199529E+00 | loss scale: 32768.0 | grad norm: 191859.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1931/ 159576 | consumed samples: 30896 | elapsed time per iteration (ms): 13806.2 | learning rate: 8.565E-06 | global batch size: 16 | lm loss: 6.954100E+00 | loss scale: 32768.0 | grad norm: 130775.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1932/ 159576 | consumed samples: 30912 | elapsed time per iteration (ms): 13613.1 | learning rate: 8.570E-06 | global batch size: 16 | lm loss: 6.704428E+00 | loss scale: 32768.0 | grad norm: 137607.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1933/ 159576 | consumed samples: 30928 | elapsed time per iteration (ms): 13506.4 | learning rate: 8.574E-06 | global batch size: 16 | lm loss: 7.014212E+00 | loss scale: 32768.0 | grad norm: 186579.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1934/ 159576 | consumed samples: 30944 | elapsed time per iteration (ms): 13520.6 | learning rate: 8.578E-06 | global batch size: 16 | lm loss: 7.012688E+00 | loss scale: 32768.0 | grad norm: 155464.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1935/ 159576 | consumed samples: 30960 | elapsed time per iteration (ms): 13855.4 | learning rate: 8.583E-06 | global batch size: 16 | lm loss: 7.011374E+00 | loss scale: 32768.0 | grad norm: 128570.064 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1936/ 159576 | consumed samples: 30976 | elapsed time per iteration (ms): 13483.8 | learning rate: 8.587E-06 | global batch size: 16 | lm loss: 6.823971E+00 | loss scale: 32768.0 | grad norm: 185286.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1937/ 159576 | consumed samples: 30992 | elapsed time per iteration (ms): 13455.5 | learning rate: 8.592E-06 | global batch size: 16 | lm loss: 7.002713E+00 | loss scale: 32768.0 | grad norm: 168834.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1938/ 159576 | consumed samples: 31008 | elapsed time per iteration (ms): 13488.7 | learning rate: 8.596E-06 | global batch size: 16 | lm loss: 7.308265E+00 | loss scale: 32768.0 | grad norm: 113334.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1939/ 159576 | consumed samples: 31024 | elapsed time per iteration (ms): 13517.8 | learning rate: 8.601E-06 | global batch size: 16 | lm loss: 6.832065E+00 | loss scale: 32768.0 | grad norm: 143617.951 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1940/ 159576 | consumed samples: 31040 | elapsed time per iteration (ms): 13777.8 | learning rate: 8.605E-06 | global batch size: 16 | lm loss: 6.758460E+00 | loss scale: 32768.0 | grad norm: 131000.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1941/ 159576 | consumed samples: 31056 | elapsed time per iteration (ms): 13526.9 | learning rate: 8.609E-06 | global batch size: 16 | lm loss: 6.587332E+00 | loss scale: 32768.0 | grad norm: 133270.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1942/ 159576 | consumed samples: 31072 | elapsed time per iteration (ms): 13522.3 | learning rate: 8.614E-06 | global batch size: 16 | lm loss: 7.005889E+00 | loss scale: 32768.0 | grad norm: 169934.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1943/ 159576 | consumed samples: 31088 | elapsed time per iteration (ms): 13505.7 | learning rate: 8.618E-06 | global batch size: 16 | lm loss: 7.113358E+00 | loss scale: 32768.0 | grad norm: 147469.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1944/ 159576 | consumed samples: 31104 | elapsed time per iteration (ms): 14004.8 | learning rate: 8.623E-06 | global batch size: 16 | lm loss: 6.815184E+00 | loss scale: 32768.0 | grad norm: 129420.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1945/ 159576 | consumed samples: 31120 | elapsed time per iteration (ms): 13536.0 | learning rate: 8.627E-06 | global batch size: 16 | lm loss: 6.802580E+00 | loss scale: 32768.0 | grad norm: 206454.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1946/ 159576 | consumed samples: 31136 | elapsed time per iteration (ms): 13571.2 | learning rate: 8.632E-06 | global batch size: 16 | lm loss: 6.899452E+00 | loss scale: 32768.0 | grad norm: 159625.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1947/ 159576 | consumed samples: 31152 | elapsed time per iteration (ms): 13512.7 | learning rate: 8.636E-06 | global batch size: 16 | lm loss: 6.902468E+00 | loss scale: 32768.0 | grad norm: 161374.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1948/ 159576 | consumed samples: 31168 | elapsed time per iteration (ms): 13965.3 | learning rate: 8.641E-06 | global batch size: 16 | lm loss: 7.027518E+00 | loss scale: 32768.0 | grad norm: 141898.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1949/ 159576 | consumed samples: 31184 | elapsed time per iteration (ms): 13617.6 | learning rate: 8.645E-06 | global batch size: 16 | lm loss: 6.901030E+00 | loss scale: 32768.0 | grad norm: 115156.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1950/ 159576 | consumed samples: 31200 | elapsed time per iteration (ms): 13549.7 | learning rate: 8.649E-06 | global batch size: 16 | lm loss: 7.012411E+00 | loss scale: 32768.0 | grad norm: 364327.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1951/ 159576 | consumed samples: 31216 | elapsed time per iteration (ms): 13460.7 | learning rate: 8.654E-06 | global batch size: 16 | lm loss: 6.996010E+00 | loss scale: 32768.0 | grad norm: 265923.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1952/ 159576 | consumed samples: 31232 | elapsed time per iteration (ms): 13574.9 | learning rate: 8.658E-06 | global batch size: 16 | lm loss: 7.002955E+00 | loss scale: 32768.0 | grad norm: 147080.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1953/ 159576 | consumed samples: 31248 | elapsed time per iteration (ms): 13782.5 | learning rate: 8.663E-06 | global batch size: 16 | lm loss: 6.930263E+00 | loss scale: 32768.0 | grad norm: 190217.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1954/ 159576 | consumed samples: 31264 | elapsed time per iteration (ms): 13515.2 | learning rate: 8.667E-06 | global batch size: 16 | lm loss: 6.835277E+00 | loss scale: 32768.0 | grad norm: 254678.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1955/ 159576 | consumed samples: 31280 | elapsed time per iteration (ms): 13569.3 | learning rate: 8.672E-06 | global batch size: 16 | lm loss: 7.283230E+00 | loss scale: 32768.0 | grad norm: 137167.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1956/ 159576 | consumed samples: 31296 | elapsed time per iteration (ms): 13592.0 | learning rate: 8.676E-06 | global batch size: 16 | lm loss: 6.895840E+00 | loss scale: 32768.0 | grad norm: 198657.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1957/ 159576 | consumed samples: 31312 | elapsed time per iteration (ms): 13906.4 | learning rate: 8.680E-06 | global batch size: 16 | lm loss: 7.127283E+00 | loss scale: 32768.0 | grad norm: 242163.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1958/ 159576 | consumed samples: 31328 | elapsed time per iteration (ms): 13647.9 | learning rate: 8.685E-06 | global batch size: 16 | lm loss: 7.022318E+00 | loss scale: 32768.0 | grad norm: 179227.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1959/ 159576 | consumed samples: 31344 | elapsed time per iteration (ms): 13668.0 | learning rate: 8.689E-06 | global batch size: 16 | lm loss: 7.021772E+00 | loss scale: 32768.0 | grad norm: 223437.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1960/ 159576 | consumed samples: 31360 | elapsed time per iteration (ms): 13699.2 | learning rate: 8.694E-06 | global batch size: 16 | lm loss: 7.270517E+00 | loss scale: 32768.0 | grad norm: 166965.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1961/ 159576 | consumed samples: 31376 | elapsed time per iteration (ms): 13595.5 | learning rate: 8.698E-06 | global batch size: 16 | lm loss: 6.963766E+00 | loss scale: 32768.0 | grad norm: 257581.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1962/ 159576 | consumed samples: 31392 | elapsed time per iteration (ms): 13818.3 | learning rate: 8.703E-06 | global batch size: 16 | lm loss: 6.847409E+00 | loss scale: 32768.0 | grad norm: 162709.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1963/ 159576 | consumed samples: 31408 | elapsed time per iteration (ms): 13645.3 | learning rate: 8.707E-06 | global batch size: 16 | lm loss: 6.902783E+00 | loss scale: 32768.0 | grad norm: 186486.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1964/ 159576 | consumed samples: 31424 | elapsed time per iteration (ms): 13637.0 | learning rate: 8.712E-06 | global batch size: 16 | lm loss: 7.112407E+00 | loss scale: 32768.0 | grad norm: 234566.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1965/ 159576 | consumed samples: 31440 | elapsed time per iteration (ms): 13632.5 | learning rate: 8.716E-06 | global batch size: 16 | lm loss: 6.965158E+00 | loss scale: 32768.0 | grad norm: 162405.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1966/ 159576 | consumed samples: 31456 | elapsed time per iteration (ms): 13923.2 | learning rate: 8.720E-06 | global batch size: 16 | lm loss: 7.162685E+00 | loss scale: 32768.0 | grad norm: 160740.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1967/ 159576 | consumed samples: 31472 | elapsed time per iteration (ms): 13722.5 | learning rate: 8.725E-06 | global batch size: 16 | lm loss: 6.822609E+00 | loss scale: 32768.0 | grad norm: 163162.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1968/ 159576 | consumed samples: 31488 | elapsed time per iteration (ms): 13559.9 | learning rate: 8.729E-06 | global batch size: 16 | lm loss: 6.829067E+00 | loss scale: 32768.0 | grad norm: 148991.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1969/ 159576 | consumed samples: 31504 | elapsed time per iteration (ms): 13640.6 | learning rate: 8.734E-06 | global batch size: 16 | lm loss: 6.753247E+00 | loss scale: 32768.0 | grad norm: 174635.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1970/ 159576 | consumed samples: 31520 | elapsed time per iteration (ms): 13996.0 | learning rate: 8.738E-06 | global batch size: 16 | lm loss: 7.113372E+00 | loss scale: 32768.0 | grad norm: 278150.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1971/ 159576 | consumed samples: 31536 | elapsed time per iteration (ms): 13669.9 | learning rate: 8.743E-06 | global batch size: 16 | lm loss: 6.872749E+00 | loss scale: 32768.0 | grad norm: 176866.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1972/ 159576 | consumed samples: 31552 | elapsed time per iteration (ms): 13634.0 | learning rate: 8.747E-06 | global batch size: 16 | lm loss: 6.944706E+00 | loss scale: 32768.0 | grad norm: 145690.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1973/ 159576 | consumed samples: 31568 | elapsed time per iteration (ms): 13676.3 | learning rate: 8.751E-06 | global batch size: 16 | lm loss: 7.106283E+00 | loss scale: 32768.0 | grad norm: 154568.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1974/ 159576 | consumed samples: 31584 | elapsed time per iteration (ms): 13610.0 | learning rate: 8.756E-06 | global batch size: 16 | lm loss: 7.001073E+00 | loss scale: 32768.0 | grad norm: 156908.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1975/ 159576 | consumed samples: 31600 | elapsed time per iteration (ms): 13727.1 | learning rate: 8.760E-06 | global batch size: 16 | lm loss: 7.050818E+00 | loss scale: 32768.0 | grad norm: 234696.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1976/ 159576 | consumed samples: 31616 | elapsed time per iteration (ms): 13612.3 | learning rate: 8.765E-06 | global batch size: 16 | lm loss: 7.084875E+00 | loss scale: 32768.0 | grad norm: 169650.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1977/ 159576 | consumed samples: 31632 | elapsed time per iteration (ms): 13652.4 | learning rate: 8.769E-06 | global batch size: 16 | lm loss: 6.942274E+00 | loss scale: 32768.0 | grad norm: 133422.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1978/ 159576 | consumed samples: 31648 | elapsed time per iteration (ms): 13598.6 | learning rate: 8.774E-06 | global batch size: 16 | lm loss: 7.020503E+00 | loss scale: 32768.0 | grad norm: 191046.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1979/ 159576 | consumed samples: 31664 | elapsed time per iteration (ms): 6793.7 | learning rate: 8.774E-06 | global batch size: 16 | lm loss: 7.205068E+00 | loss scale: 16384.0 | grad norm: 191046.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1980/ 159576 | consumed samples: 31680 | elapsed time per iteration (ms): 13294.9 | learning rate: 8.778E-06 | global batch size: 16 | lm loss: 6.981399E+00 | loss scale: 16384.0 | grad norm: 88750.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1981/ 159576 | consumed samples: 31696 | elapsed time per iteration (ms): 13611.4 | learning rate: 8.783E-06 | global batch size: 16 | lm loss: 7.062120E+00 | loss scale: 16384.0 | grad norm: 98643.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1982/ 159576 | consumed samples: 31712 | elapsed time per iteration (ms): 13593.8 | learning rate: 8.787E-06 | global batch size: 16 | lm loss: 6.878181E+00 | loss scale: 16384.0 | grad norm: 67555.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1983/ 159576 | consumed samples: 31728 | elapsed time per iteration (ms): 13656.6 | learning rate: 8.791E-06 | global batch size: 16 | lm loss: 6.958256E+00 | loss scale: 16384.0 | grad norm: 79163.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1984/ 159576 | consumed samples: 31744 | elapsed time per iteration (ms): 13863.2 | learning rate: 8.796E-06 | global batch size: 16 | lm loss: 6.850488E+00 | loss scale: 16384.0 | grad norm: 49908.825 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1985/ 159576 | consumed samples: 31760 | elapsed time per iteration (ms): 13625.0 | learning rate: 8.800E-06 | global batch size: 16 | lm loss: 7.227520E+00 | loss scale: 16384.0 | grad norm: 56779.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1986/ 159576 | consumed samples: 31776 | elapsed time per iteration (ms): 13644.4 | learning rate: 8.805E-06 | global batch size: 16 | lm loss: 7.002261E+00 | loss scale: 16384.0 | grad norm: 88929.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1987/ 159576 | consumed samples: 31792 | elapsed time per iteration (ms): 13690.4 | learning rate: 8.809E-06 | global batch size: 16 | lm loss: 7.085162E+00 | loss scale: 16384.0 | grad norm: 50454.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1988/ 159576 | consumed samples: 31808 | elapsed time per iteration (ms): 13934.9 | learning rate: 8.814E-06 | global batch size: 16 | lm loss: 6.948382E+00 | loss scale: 16384.0 | grad norm: 95360.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1989/ 159576 | consumed samples: 31824 | elapsed time per iteration (ms): 13779.2 | learning rate: 8.818E-06 | global batch size: 16 | lm loss: 6.810514E+00 | loss scale: 16384.0 | grad norm: 64656.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1990/ 159576 | consumed samples: 31840 | elapsed time per iteration (ms): 13639.8 | learning rate: 8.822E-06 | global batch size: 16 | lm loss: 6.904098E+00 | loss scale: 16384.0 | grad norm: 77126.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1991/ 159576 | consumed samples: 31856 | elapsed time per iteration (ms): 13559.7 | learning rate: 8.827E-06 | global batch size: 16 | lm loss: 6.833849E+00 | loss scale: 16384.0 | grad norm: 68875.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1992/ 159576 | consumed samples: 31872 | elapsed time per iteration (ms): 13602.8 | learning rate: 8.831E-06 | global batch size: 16 | lm loss: 6.989305E+00 | loss scale: 16384.0 | grad norm: 77647.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1993/ 159576 | consumed samples: 31888 | elapsed time per iteration (ms): 13976.7 | learning rate: 8.836E-06 | global batch size: 16 | lm loss: 6.928751E+00 | loss scale: 16384.0 | grad norm: 67757.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1994/ 159576 | consumed samples: 31904 | elapsed time per iteration (ms): 13704.1 | learning rate: 8.840E-06 | global batch size: 16 | lm loss: 6.835466E+00 | loss scale: 16384.0 | grad norm: 69187.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1995/ 159576 | consumed samples: 31920 | elapsed time per iteration (ms): 13650.9 | learning rate: 8.845E-06 | global batch size: 16 | lm loss: 7.294861E+00 | loss scale: 16384.0 | grad norm: 143539.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1996/ 159576 | consumed samples: 31936 | elapsed time per iteration (ms): 13627.5 | learning rate: 8.849E-06 | global batch size: 16 | lm loss: 7.121392E+00 | loss scale: 16384.0 | grad norm: 74325.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1997/ 159576 | consumed samples: 31952 | elapsed time per iteration (ms): 13965.6 | learning rate: 8.854E-06 | global batch size: 16 | lm loss: 7.100244E+00 | loss scale: 16384.0 | grad norm: 92130.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1998/ 159576 | consumed samples: 31968 | elapsed time per iteration (ms): 13575.4 | learning rate: 8.858E-06 | global batch size: 16 | lm loss: 7.115794E+00 | loss scale: 16384.0 | grad norm: 85348.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1999/ 159576 | consumed samples: 31984 | elapsed time per iteration (ms): 13570.5 | learning rate: 8.862E-06 | global batch size: 16 | lm loss: 6.867460E+00 | loss scale: 16384.0 | grad norm: 112217.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 09:55:59,653] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=2, lr=[8.866863905325443e-06, 8.866863905325443e-06], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 2000 loss: 6.9298 iter time (s): 0.007 samples/sec: 2456.816 iteration 2000/ 159576 | consumed samples: 32000 | elapsed time per iteration (ms): 13633.7 | learning rate: 8.867E-06 | global batch size: 16 | lm loss: 6.929829E+00 | loss scale: 16384.0 | grad norm: 73846.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 2000 | lm loss value: 7.084489E+00 | lm loss PPL: 1.193313E+03 | ------------------------------------------------------------------------------------------------ iteration 2001/ 159576 | consumed samples: 32016 | elapsed time per iteration (ms): 18999.4 | learning rate: 8.871E-06 | global batch size: 16 | lm loss: 6.882600E+00 | loss scale: 16384.0 | grad norm: 132358.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2002/ 159576 | consumed samples: 32032 | elapsed time per iteration (ms): 13626.5 | learning rate: 8.876E-06 | global batch size: 16 | lm loss: 7.231313E+00 | loss scale: 16384.0 | grad norm: 139453.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2003/ 159576 | consumed samples: 32048 | elapsed time per iteration (ms): 13687.4 | learning rate: 8.880E-06 | global batch size: 16 | lm loss: 7.034769E+00 | loss scale: 16384.0 | grad norm: 74117.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2004/ 159576 | consumed samples: 32064 | elapsed time per iteration (ms): 13579.3 | learning rate: 8.885E-06 | global batch size: 16 | lm loss: 7.053939E+00 | loss scale: 16384.0 | grad norm: 185455.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2005/ 159576 | consumed samples: 32080 | elapsed time per iteration (ms): 13617.6 | learning rate: 8.889E-06 | global batch size: 16 | lm loss: 6.871277E+00 | loss scale: 16384.0 | grad norm: 117343.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2006/ 159576 | consumed samples: 32096 | elapsed time per iteration (ms): 13892.7 | learning rate: 8.893E-06 | global batch size: 16 | lm loss: 6.839181E+00 | loss scale: 16384.0 | grad norm: 77619.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2007/ 159576 | consumed samples: 32112 | elapsed time per iteration (ms): 13580.2 | learning rate: 8.898E-06 | global batch size: 16 | lm loss: 7.031313E+00 | loss scale: 16384.0 | grad norm: 111506.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2008/ 159576 | consumed samples: 32128 | elapsed time per iteration (ms): 13652.0 | learning rate: 8.902E-06 | global batch size: 16 | lm loss: 6.763354E+00 | loss scale: 16384.0 | grad norm: 74284.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2009/ 159576 | consumed samples: 32144 | elapsed time per iteration (ms): 13663.9 | learning rate: 8.907E-06 | global batch size: 16 | lm loss: 7.173141E+00 | loss scale: 16384.0 | grad norm: 176920.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2010/ 159576 | consumed samples: 32160 | elapsed time per iteration (ms): 14071.2 | learning rate: 8.911E-06 | global batch size: 16 | lm loss: 6.940368E+00 | loss scale: 16384.0 | grad norm: 136609.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2011/ 159576 | consumed samples: 32176 | elapsed time per iteration (ms): 13641.6 | learning rate: 8.916E-06 | global batch size: 16 | lm loss: 7.348205E+00 | loss scale: 16384.0 | grad norm: 74685.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2012/ 159576 | consumed samples: 32192 | elapsed time per iteration (ms): 13599.3 | learning rate: 8.920E-06 | global batch size: 16 | lm loss: 6.813260E+00 | loss scale: 16384.0 | grad norm: 98269.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2013/ 159576 | consumed samples: 32208 | elapsed time per iteration (ms): 13658.0 | learning rate: 8.925E-06 | global batch size: 16 | lm loss: 7.088203E+00 | loss scale: 16384.0 | grad norm: 67591.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2014/ 159576 | consumed samples: 32224 | elapsed time per iteration (ms): 14073.3 | learning rate: 8.929E-06 | global batch size: 16 | lm loss: 6.925144E+00 | loss scale: 16384.0 | grad norm: 125518.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2015/ 159576 | consumed samples: 32240 | elapsed time per iteration (ms): 13531.4 | learning rate: 8.933E-06 | global batch size: 16 | lm loss: 7.150875E+00 | loss scale: 16384.0 | grad norm: 145833.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2016/ 159576 | consumed samples: 32256 | elapsed time per iteration (ms): 13718.9 | learning rate: 8.938E-06 | global batch size: 16 | lm loss: 7.058916E+00 | loss scale: 16384.0 | grad norm: 104576.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2017/ 159576 | consumed samples: 32272 | elapsed time per iteration (ms): 13660.3 | learning rate: 8.942E-06 | global batch size: 16 | lm loss: 7.075126E+00 | loss scale: 16384.0 | grad norm: 68969.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2018/ 159576 | consumed samples: 32288 | elapsed time per iteration (ms): 13657.9 | learning rate: 8.947E-06 | global batch size: 16 | lm loss: 7.021468E+00 | loss scale: 16384.0 | grad norm: 102873.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2019/ 159576 | consumed samples: 32304 | elapsed time per iteration (ms): 13864.5 | learning rate: 8.951E-06 | global batch size: 16 | lm loss: 7.182456E+00 | loss scale: 16384.0 | grad norm: 83098.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2020/ 159576 | consumed samples: 32320 | elapsed time per iteration (ms): 13595.8 | learning rate: 8.956E-06 | global batch size: 16 | lm loss: 7.201014E+00 | loss scale: 16384.0 | grad norm: 86577.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2021/ 159576 | consumed samples: 32336 | elapsed time per iteration (ms): 13656.2 | learning rate: 8.960E-06 | global batch size: 16 | lm loss: 7.021406E+00 | loss scale: 16384.0 | grad norm: 81681.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2022/ 159576 | consumed samples: 32352 | elapsed time per iteration (ms): 13573.2 | learning rate: 8.964E-06 | global batch size: 16 | lm loss: 7.084285E+00 | loss scale: 16384.0 | grad norm: 87860.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2023/ 159576 | consumed samples: 32368 | elapsed time per iteration (ms): 13983.6 | learning rate: 8.969E-06 | global batch size: 16 | lm loss: 6.934657E+00 | loss scale: 16384.0 | grad norm: 59691.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2024/ 159576 | consumed samples: 32384 | elapsed time per iteration (ms): 13601.4 | learning rate: 8.973E-06 | global batch size: 16 | lm loss: 7.007637E+00 | loss scale: 16384.0 | grad norm: 90222.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2025/ 159576 | consumed samples: 32400 | elapsed time per iteration (ms): 13711.5 | learning rate: 8.978E-06 | global batch size: 16 | lm loss: 6.979746E+00 | loss scale: 16384.0 | grad norm: 93849.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2026/ 159576 | consumed samples: 32416 | elapsed time per iteration (ms): 13699.6 | learning rate: 8.982E-06 | global batch size: 16 | lm loss: 6.934021E+00 | loss scale: 16384.0 | grad norm: 80041.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2027/ 159576 | consumed samples: 32432 | elapsed time per iteration (ms): 14076.1 | learning rate: 8.987E-06 | global batch size: 16 | lm loss: 6.980267E+00 | loss scale: 16384.0 | grad norm: 62895.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2028/ 159576 | consumed samples: 32448 | elapsed time per iteration (ms): 13679.2 | learning rate: 8.991E-06 | global batch size: 16 | lm loss: 7.024888E+00 | loss scale: 16384.0 | grad norm: 52171.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2029/ 159576 | consumed samples: 32464 | elapsed time per iteration (ms): 13587.5 | learning rate: 8.996E-06 | global batch size: 16 | lm loss: 7.115479E+00 | loss scale: 16384.0 | grad norm: 102889.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2030/ 159576 | consumed samples: 32480 | elapsed time per iteration (ms): 13601.6 | learning rate: 9.000E-06 | global batch size: 16 | lm loss: 7.058015E+00 | loss scale: 16384.0 | grad norm: 59629.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2031/ 159576 | consumed samples: 32496 | elapsed time per iteration (ms): 13586.5 | learning rate: 9.004E-06 | global batch size: 16 | lm loss: 7.114190E+00 | loss scale: 16384.0 | grad norm: 71212.111 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2032/ 159576 | consumed samples: 32512 | elapsed time per iteration (ms): 13640.1 | learning rate: 9.009E-06 | global batch size: 16 | lm loss: 7.060964E+00 | loss scale: 16384.0 | grad norm: 64723.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2033/ 159576 | consumed samples: 32528 | elapsed time per iteration (ms): 13600.9 | learning rate: 9.013E-06 | global batch size: 16 | lm loss: 7.134828E+00 | loss scale: 16384.0 | grad norm: 56762.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2034/ 159576 | consumed samples: 32544 | elapsed time per iteration (ms): 13742.8 | learning rate: 9.018E-06 | global batch size: 16 | lm loss: 7.147020E+00 | loss scale: 16384.0 | grad norm: 116614.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2035/ 159576 | consumed samples: 32560 | elapsed time per iteration (ms): 13462.2 | learning rate: 9.022E-06 | global batch size: 16 | lm loss: 7.059257E+00 | loss scale: 16384.0 | grad norm: 95862.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2036/ 159576 | consumed samples: 32576 | elapsed time per iteration (ms): 13758.4 | learning rate: 9.027E-06 | global batch size: 16 | lm loss: 7.181802E+00 | loss scale: 16384.0 | grad norm: 60688.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2037/ 159576 | consumed samples: 32592 | elapsed time per iteration (ms): 13746.2 | learning rate: 9.031E-06 | global batch size: 16 | lm loss: 6.934449E+00 | loss scale: 16384.0 | grad norm: 71317.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2038/ 159576 | consumed samples: 32608 | elapsed time per iteration (ms): 13575.3 | learning rate: 9.036E-06 | global batch size: 16 | lm loss: 7.080134E+00 | loss scale: 16384.0 | grad norm: 57102.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2039/ 159576 | consumed samples: 32624 | elapsed time per iteration (ms): 13585.4 | learning rate: 9.040E-06 | global batch size: 16 | lm loss: 6.727631E+00 | loss scale: 16384.0 | grad norm: 74231.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2040/ 159576 | consumed samples: 32640 | elapsed time per iteration (ms): 13615.9 | learning rate: 9.044E-06 | global batch size: 16 | lm loss: 6.939786E+00 | loss scale: 16384.0 | grad norm: 75764.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.scheduler:Execution of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 10:05:20 CEST)" skipped: maximum number of running instances reached (3) iteration 2041/ 159576 | consumed samples: 32656 | elapsed time per iteration (ms): 14017.0 | learning rate: 9.049E-06 | global batch size: 16 | lm loss: 6.912762E+00 | loss scale: 16384.0 | grad norm: 60359.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2042/ 159576 | consumed samples: 32672 | elapsed time per iteration (ms): 13386.8 | learning rate: 9.053E-06 | global batch size: 16 | lm loss: 6.892349E+00 | loss scale: 16384.0 | grad norm: 68369.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 10:05:52] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 10:05:52] PULSE: tr8-104B is running for 4:13:41 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 2043/ 159576 | consumed samples: 32688 | elapsed time per iteration (ms): 13496.3 | learning rate: 9.058E-06 | global batch size: 16 | lm loss: 7.106496E+00 | loss scale: 16384.0 | grad norm: 74847.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2044/ 159576 | consumed samples: 32704 | elapsed time per iteration (ms): 13461.5 | learning rate: 9.062E-06 | global batch size: 16 | lm loss: 7.101841E+00 | loss scale: 16384.0 | grad norm: 81326.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2045/ 159576 | consumed samples: 32720 | elapsed time per iteration (ms): 14029.5 | learning rate: 9.067E-06 | global batch size: 16 | lm loss: 6.818883E+00 | loss scale: 16384.0 | grad norm: 55780.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2046/ 159576 | consumed samples: 32736 | elapsed time per iteration (ms): 13528.3 | learning rate: 9.071E-06 | global batch size: 16 | lm loss: 7.344654E+00 | loss scale: 16384.0 | grad norm: 85807.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2047/ 159576 | consumed samples: 32752 | elapsed time per iteration (ms): 13633.2 | learning rate: 9.075E-06 | global batch size: 16 | lm loss: 7.041794E+00 | loss scale: 16384.0 | grad norm: 68040.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2048/ 159576 | consumed samples: 32768 | elapsed time per iteration (ms): 13714.3 | learning rate: 9.080E-06 | global batch size: 16 | lm loss: 7.051764E+00 | loss scale: 16384.0 | grad norm: 54860.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2049/ 159576 | consumed samples: 32784 | elapsed time per iteration (ms): 13991.3 | learning rate: 9.084E-06 | global batch size: 16 | lm loss: 6.824497E+00 | loss scale: 16384.0 | grad norm: 71323.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2050/ 159576 | consumed samples: 32800 | elapsed time per iteration (ms): 13606.5 | learning rate: 9.089E-06 | global batch size: 16 | lm loss: 7.182322E+00 | loss scale: 16384.0 | grad norm: 85719.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2051/ 159576 | consumed samples: 32816 | elapsed time per iteration (ms): 13580.8 | learning rate: 9.093E-06 | global batch size: 16 | lm loss: 7.293634E+00 | loss scale: 16384.0 | grad norm: 80588.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2052/ 159576 | consumed samples: 32832 | elapsed time per iteration (ms): 13550.0 | learning rate: 9.098E-06 | global batch size: 16 | lm loss: 7.101615E+00 | loss scale: 16384.0 | grad norm: 84442.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2053/ 159576 | consumed samples: 32848 | elapsed time per iteration (ms): 13599.2 | learning rate: 9.102E-06 | global batch size: 16 | lm loss: 7.037670E+00 | loss scale: 16384.0 | grad norm: 66660.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2054/ 159576 | consumed samples: 32864 | elapsed time per iteration (ms): 13845.0 | learning rate: 9.107E-06 | global batch size: 16 | lm loss: 7.019003E+00 | loss scale: 16384.0 | grad norm: 62001.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2055/ 159576 | consumed samples: 32880 | elapsed time per iteration (ms): 13669.5 | learning rate: 9.111E-06 | global batch size: 16 | lm loss: 6.911786E+00 | loss scale: 16384.0 | grad norm: 117097.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2056/ 159576 | consumed samples: 32896 | elapsed time per iteration (ms): 13595.0 | learning rate: 9.115E-06 | global batch size: 16 | lm loss: 7.090348E+00 | loss scale: 16384.0 | grad norm: 84113.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2057/ 159576 | consumed samples: 32912 | elapsed time per iteration (ms): 13602.9 | learning rate: 9.120E-06 | global batch size: 16 | lm loss: 6.805397E+00 | loss scale: 16384.0 | grad norm: 74285.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2058/ 159576 | consumed samples: 32928 | elapsed time per iteration (ms): 13938.5 | learning rate: 9.124E-06 | global batch size: 16 | lm loss: 7.156925E+00 | loss scale: 16384.0 | grad norm: 123564.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2059/ 159576 | consumed samples: 32944 | elapsed time per iteration (ms): 13535.6 | learning rate: 9.129E-06 | global batch size: 16 | lm loss: 7.097910E+00 | loss scale: 16384.0 | grad norm: 80614.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2060/ 159576 | consumed samples: 32960 | elapsed time per iteration (ms): 13561.1 | learning rate: 9.133E-06 | global batch size: 16 | lm loss: 7.173540E+00 | loss scale: 16384.0 | grad norm: 82969.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2061/ 159576 | consumed samples: 32976 | elapsed time per iteration (ms): 13641.0 | learning rate: 9.138E-06 | global batch size: 16 | lm loss: 6.963642E+00 | loss scale: 16384.0 | grad norm: 58968.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2062/ 159576 | consumed samples: 32992 | elapsed time per iteration (ms): 13737.9 | learning rate: 9.142E-06 | global batch size: 16 | lm loss: 6.932078E+00 | loss scale: 16384.0 | grad norm: 176037.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2063/ 159576 | consumed samples: 33008 | elapsed time per iteration (ms): 13779.6 | learning rate: 9.146E-06 | global batch size: 16 | lm loss: 6.904696E+00 | loss scale: 16384.0 | grad norm: 107303.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2064/ 159576 | consumed samples: 33024 | elapsed time per iteration (ms): 13634.2 | learning rate: 9.151E-06 | global batch size: 16 | lm loss: 6.834531E+00 | loss scale: 16384.0 | grad norm: 100378.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2065/ 159576 | consumed samples: 33040 | elapsed time per iteration (ms): 13654.1 | learning rate: 9.155E-06 | global batch size: 16 | lm loss: 7.101809E+00 | loss scale: 16384.0 | grad norm: 100637.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2066/ 159576 | consumed samples: 33056 | elapsed time per iteration (ms): 13496.2 | learning rate: 9.160E-06 | global batch size: 16 | lm loss: 6.822946E+00 | loss scale: 16384.0 | grad norm: 72463.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2067/ 159576 | consumed samples: 33072 | elapsed time per iteration (ms): 14117.2 | learning rate: 9.164E-06 | global batch size: 16 | lm loss: 7.133995E+00 | loss scale: 16384.0 | grad norm: 265928.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2068/ 159576 | consumed samples: 33088 | elapsed time per iteration (ms): 13658.0 | learning rate: 9.169E-06 | global batch size: 16 | lm loss: 7.058832E+00 | loss scale: 16384.0 | grad norm: 225451.637 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2069/ 159576 | consumed samples: 33104 | elapsed time per iteration (ms): 13647.8 | learning rate: 9.173E-06 | global batch size: 16 | lm loss: 6.733691E+00 | loss scale: 16384.0 | grad norm: 109352.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2070/ 159576 | consumed samples: 33120 | elapsed time per iteration (ms): 13662.1 | learning rate: 9.178E-06 | global batch size: 16 | lm loss: 7.330385E+00 | loss scale: 16384.0 | grad norm: 106190.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2071/ 159576 | consumed samples: 33136 | elapsed time per iteration (ms): 14047.9 | learning rate: 9.182E-06 | global batch size: 16 | lm loss: 6.902629E+00 | loss scale: 16384.0 | grad norm: 105263.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2072/ 159576 | consumed samples: 33152 | elapsed time per iteration (ms): 13604.8 | learning rate: 9.186E-06 | global batch size: 16 | lm loss: 7.059223E+00 | loss scale: 16384.0 | grad norm: 156071.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2073/ 159576 | consumed samples: 33168 | elapsed time per iteration (ms): 13509.3 | learning rate: 9.191E-06 | global batch size: 16 | lm loss: 6.858756E+00 | loss scale: 16384.0 | grad norm: 183069.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2074/ 159576 | consumed samples: 33184 | elapsed time per iteration (ms): 13577.0 | learning rate: 9.195E-06 | global batch size: 16 | lm loss: 7.137619E+00 | loss scale: 16384.0 | grad norm: 165868.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2075/ 159576 | consumed samples: 33200 | elapsed time per iteration (ms): 13598.1 | learning rate: 9.200E-06 | global batch size: 16 | lm loss: 7.105383E+00 | loss scale: 16384.0 | grad norm: 81641.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2076/ 159576 | consumed samples: 33216 | elapsed time per iteration (ms): 13844.7 | learning rate: 9.204E-06 | global batch size: 16 | lm loss: 6.954556E+00 | loss scale: 16384.0 | grad norm: 90347.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2077/ 159576 | consumed samples: 33232 | elapsed time per iteration (ms): 13642.3 | learning rate: 9.209E-06 | global batch size: 16 | lm loss: 6.986308E+00 | loss scale: 16384.0 | grad norm: 71161.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2078/ 159576 | consumed samples: 33248 | elapsed time per iteration (ms): 13714.7 | learning rate: 9.213E-06 | global batch size: 16 | lm loss: 7.186345E+00 | loss scale: 16384.0 | grad norm: 125006.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2079/ 159576 | consumed samples: 33264 | elapsed time per iteration (ms): 13724.6 | learning rate: 9.217E-06 | global batch size: 16 | lm loss: 7.046529E+00 | loss scale: 16384.0 | grad norm: 72474.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2080/ 159576 | consumed samples: 33280 | elapsed time per iteration (ms): 13823.6 | learning rate: 9.222E-06 | global batch size: 16 | lm loss: 6.926587E+00 | loss scale: 16384.0 | grad norm: 72628.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2081/ 159576 | consumed samples: 33296 | elapsed time per iteration (ms): 13659.2 | learning rate: 9.226E-06 | global batch size: 16 | lm loss: 6.850713E+00 | loss scale: 16384.0 | grad norm: 78040.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2082/ 159576 | consumed samples: 33312 | elapsed time per iteration (ms): 13653.7 | learning rate: 9.231E-06 | global batch size: 16 | lm loss: 7.014567E+00 | loss scale: 16384.0 | grad norm: 88063.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2083/ 159576 | consumed samples: 33328 | elapsed time per iteration (ms): 13690.1 | learning rate: 9.235E-06 | global batch size: 16 | lm loss: 6.964838E+00 | loss scale: 16384.0 | grad norm: 68577.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2084/ 159576 | consumed samples: 33344 | elapsed time per iteration (ms): 14064.9 | learning rate: 9.240E-06 | global batch size: 16 | lm loss: 6.954602E+00 | loss scale: 16384.0 | grad norm: 70285.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2085/ 159576 | consumed samples: 33360 | elapsed time per iteration (ms): 13835.0 | learning rate: 9.244E-06 | global batch size: 16 | lm loss: 6.952052E+00 | loss scale: 16384.0 | grad norm: 85673.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2086/ 159576 | consumed samples: 33376 | elapsed time per iteration (ms): 13813.8 | learning rate: 9.249E-06 | global batch size: 16 | lm loss: 6.909387E+00 | loss scale: 16384.0 | grad norm: 118966.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2087/ 159576 | consumed samples: 33392 | elapsed time per iteration (ms): 13678.6 | learning rate: 9.253E-06 | global batch size: 16 | lm loss: 6.961540E+00 | loss scale: 16384.0 | grad norm: 66329.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2088/ 159576 | consumed samples: 33408 | elapsed time per iteration (ms): 13699.4 | learning rate: 9.257E-06 | global batch size: 16 | lm loss: 7.038545E+00 | loss scale: 16384.0 | grad norm: 77147.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2089/ 159576 | consumed samples: 33424 | elapsed time per iteration (ms): 13870.3 | learning rate: 9.262E-06 | global batch size: 16 | lm loss: 6.829208E+00 | loss scale: 16384.0 | grad norm: 66850.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2090/ 159576 | consumed samples: 33440 | elapsed time per iteration (ms): 13553.2 | learning rate: 9.266E-06 | global batch size: 16 | lm loss: 6.885040E+00 | loss scale: 16384.0 | grad norm: 63418.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2091/ 159576 | consumed samples: 33456 | elapsed time per iteration (ms): 13563.4 | learning rate: 9.271E-06 | global batch size: 16 | lm loss: 7.227287E+00 | loss scale: 16384.0 | grad norm: 99229.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2092/ 159576 | consumed samples: 33472 | elapsed time per iteration (ms): 13616.1 | learning rate: 9.275E-06 | global batch size: 16 | lm loss: 7.151490E+00 | loss scale: 16384.0 | grad norm: 77793.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2093/ 159576 | consumed samples: 33488 | elapsed time per iteration (ms): 14020.5 | learning rate: 9.280E-06 | global batch size: 16 | lm loss: 6.956719E+00 | loss scale: 16384.0 | grad norm: 71078.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2094/ 159576 | consumed samples: 33504 | elapsed time per iteration (ms): 13583.2 | learning rate: 9.284E-06 | global batch size: 16 | lm loss: 6.863022E+00 | loss scale: 16384.0 | grad norm: 75874.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2095/ 159576 | consumed samples: 33520 | elapsed time per iteration (ms): 13540.7 | learning rate: 9.288E-06 | global batch size: 16 | lm loss: 7.230942E+00 | loss scale: 16384.0 | grad norm: 66376.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2096/ 159576 | consumed samples: 33536 | elapsed time per iteration (ms): 13617.6 | learning rate: 9.293E-06 | global batch size: 16 | lm loss: 6.938297E+00 | loss scale: 16384.0 | grad norm: 80597.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2097/ 159576 | consumed samples: 33552 | elapsed time per iteration (ms): 13611.2 | learning rate: 9.297E-06 | global batch size: 16 | lm loss: 6.750860E+00 | loss scale: 16384.0 | grad norm: 50768.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2098/ 159576 | consumed samples: 33568 | elapsed time per iteration (ms): 13781.0 | learning rate: 9.302E-06 | global batch size: 16 | lm loss: 6.866726E+00 | loss scale: 16384.0 | grad norm: 120258.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2099/ 159576 | consumed samples: 33584 | elapsed time per iteration (ms): 13657.4 | learning rate: 9.306E-06 | global batch size: 16 | lm loss: 6.825637E+00 | loss scale: 16384.0 | grad norm: 95301.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2100/ 159576 | consumed samples: 33600 | elapsed time per iteration (ms): 13666.9 | learning rate: 9.311E-06 | global batch size: 16 | lm loss: 6.864701E+00 | loss scale: 16384.0 | grad norm: 68908.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2101/ 159576 | consumed samples: 33616 | elapsed time per iteration (ms): 13629.3 | learning rate: 9.315E-06 | global batch size: 16 | lm loss: 6.992301E+00 | loss scale: 16384.0 | grad norm: 74768.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2102/ 159576 | consumed samples: 33632 | elapsed time per iteration (ms): 14067.7 | learning rate: 9.320E-06 | global batch size: 16 | lm loss: 7.044778E+00 | loss scale: 16384.0 | grad norm: 118054.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2103/ 159576 | consumed samples: 33648 | elapsed time per iteration (ms): 13615.1 | learning rate: 9.324E-06 | global batch size: 16 | lm loss: 7.033617E+00 | loss scale: 16384.0 | grad norm: 69826.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2104/ 159576 | consumed samples: 33664 | elapsed time per iteration (ms): 13577.5 | learning rate: 9.328E-06 | global batch size: 16 | lm loss: 6.970243E+00 | loss scale: 16384.0 | grad norm: 88873.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2105/ 159576 | consumed samples: 33680 | elapsed time per iteration (ms): 13581.9 | learning rate: 9.333E-06 | global batch size: 16 | lm loss: 6.917067E+00 | loss scale: 16384.0 | grad norm: 93657.084 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2106/ 159576 | consumed samples: 33696 | elapsed time per iteration (ms): 14007.1 | learning rate: 9.337E-06 | global batch size: 16 | lm loss: 7.027580E+00 | loss scale: 16384.0 | grad norm: 62511.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2107/ 159576 | consumed samples: 33712 | elapsed time per iteration (ms): 13598.0 | learning rate: 9.342E-06 | global batch size: 16 | lm loss: 7.132909E+00 | loss scale: 16384.0 | grad norm: 177960.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2108/ 159576 | consumed samples: 33728 | elapsed time per iteration (ms): 13635.0 | learning rate: 9.346E-06 | global batch size: 16 | lm loss: 7.048873E+00 | loss scale: 16384.0 | grad norm: 122116.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2109/ 159576 | consumed samples: 33744 | elapsed time per iteration (ms): 13663.3 | learning rate: 9.351E-06 | global batch size: 16 | lm loss: 6.996678E+00 | loss scale: 16384.0 | grad norm: 85763.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2110/ 159576 | consumed samples: 33760 | elapsed time per iteration (ms): 13680.8 | learning rate: 9.355E-06 | global batch size: 16 | lm loss: 6.889836E+00 | loss scale: 16384.0 | grad norm: 84089.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2111/ 159576 | consumed samples: 33776 | elapsed time per iteration (ms): 13628.5 | learning rate: 9.359E-06 | global batch size: 16 | lm loss: 6.968468E+00 | loss scale: 16384.0 | grad norm: 51256.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2112/ 159576 | consumed samples: 33792 | elapsed time per iteration (ms): 13610.9 | learning rate: 9.364E-06 | global batch size: 16 | lm loss: 6.917239E+00 | loss scale: 16384.0 | grad norm: 126008.694 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2113/ 159576 | consumed samples: 33808 | elapsed time per iteration (ms): 13593.1 | learning rate: 9.368E-06 | global batch size: 16 | lm loss: 6.871556E+00 | loss scale: 16384.0 | grad norm: 67758.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2114/ 159576 | consumed samples: 33824 | elapsed time per iteration (ms): 13663.1 | learning rate: 9.373E-06 | global batch size: 16 | lm loss: 6.927833E+00 | loss scale: 16384.0 | grad norm: 85851.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2115/ 159576 | consumed samples: 33840 | elapsed time per iteration (ms): 13986.1 | learning rate: 9.377E-06 | global batch size: 16 | lm loss: 6.965062E+00 | loss scale: 16384.0 | grad norm: 65169.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2116/ 159576 | consumed samples: 33856 | elapsed time per iteration (ms): 13585.2 | learning rate: 9.382E-06 | global batch size: 16 | lm loss: 7.081017E+00 | loss scale: 16384.0 | grad norm: 73782.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2117/ 159576 | consumed samples: 33872 | elapsed time per iteration (ms): 13717.9 | learning rate: 9.386E-06 | global batch size: 16 | lm loss: 7.005242E+00 | loss scale: 16384.0 | grad norm: 125037.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2118/ 159576 | consumed samples: 33888 | elapsed time per iteration (ms): 13567.3 | learning rate: 9.391E-06 | global batch size: 16 | lm loss: 6.785961E+00 | loss scale: 16384.0 | grad norm: 74382.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2119/ 159576 | consumed samples: 33904 | elapsed time per iteration (ms): 13839.4 | learning rate: 9.395E-06 | global batch size: 16 | lm loss: 7.037541E+00 | loss scale: 16384.0 | grad norm: 61070.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2120/ 159576 | consumed samples: 33920 | elapsed time per iteration (ms): 13840.1 | learning rate: 9.399E-06 | global batch size: 16 | lm loss: 6.688106E+00 | loss scale: 16384.0 | grad norm: 77514.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2121/ 159576 | consumed samples: 33936 | elapsed time per iteration (ms): 13591.3 | learning rate: 9.404E-06 | global batch size: 16 | lm loss: 6.965182E+00 | loss scale: 16384.0 | grad norm: 85559.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2122/ 159576 | consumed samples: 33952 | elapsed time per iteration (ms): 13658.1 | learning rate: 9.408E-06 | global batch size: 16 | lm loss: 6.891047E+00 | loss scale: 16384.0 | grad norm: 84454.855 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2123/ 159576 | consumed samples: 33968 | elapsed time per iteration (ms): 13650.8 | learning rate: 9.413E-06 | global batch size: 16 | lm loss: 6.784370E+00 | loss scale: 16384.0 | grad norm: 74803.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2124/ 159576 | consumed samples: 33984 | elapsed time per iteration (ms): 13935.2 | learning rate: 9.417E-06 | global batch size: 16 | lm loss: 6.885671E+00 | loss scale: 16384.0 | grad norm: 68340.117 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2125/ 159576 | consumed samples: 34000 | elapsed time per iteration (ms): 13650.4 | learning rate: 9.422E-06 | global batch size: 16 | lm loss: 7.116186E+00 | loss scale: 16384.0 | grad norm: 75719.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2126/ 159576 | consumed samples: 34016 | elapsed time per iteration (ms): 13617.2 | learning rate: 9.426E-06 | global batch size: 16 | lm loss: 6.759393E+00 | loss scale: 16384.0 | grad norm: 57051.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2127/ 159576 | consumed samples: 34032 | elapsed time per iteration (ms): 13606.4 | learning rate: 9.430E-06 | global batch size: 16 | lm loss: 6.895882E+00 | loss scale: 16384.0 | grad norm: 117422.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2128/ 159576 | consumed samples: 34048 | elapsed time per iteration (ms): 13879.5 | learning rate: 9.435E-06 | global batch size: 16 | lm loss: 6.990780E+00 | loss scale: 16384.0 | grad norm: 47327.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2129/ 159576 | consumed samples: 34064 | elapsed time per iteration (ms): 13685.2 | learning rate: 9.439E-06 | global batch size: 16 | lm loss: 6.883922E+00 | loss scale: 16384.0 | grad norm: 75631.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2130/ 159576 | consumed samples: 34080 | elapsed time per iteration (ms): 13677.5 | learning rate: 9.444E-06 | global batch size: 16 | lm loss: 6.880146E+00 | loss scale: 16384.0 | grad norm: 70634.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2131/ 159576 | consumed samples: 34096 | elapsed time per iteration (ms): 13735.8 | learning rate: 9.448E-06 | global batch size: 16 | lm loss: 6.800762E+00 | loss scale: 16384.0 | grad norm: 114482.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2132/ 159576 | consumed samples: 34112 | elapsed time per iteration (ms): 13614.4 | learning rate: 9.453E-06 | global batch size: 16 | lm loss: 7.057775E+00 | loss scale: 16384.0 | grad norm: 131631.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2133/ 159576 | consumed samples: 34128 | elapsed time per iteration (ms): 13899.1 | learning rate: 9.457E-06 | global batch size: 16 | lm loss: 7.006071E+00 | loss scale: 16384.0 | grad norm: 88510.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2134/ 159576 | consumed samples: 34144 | elapsed time per iteration (ms): 13637.7 | learning rate: 9.462E-06 | global batch size: 16 | lm loss: 7.062113E+00 | loss scale: 16384.0 | grad norm: 75449.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2135/ 159576 | consumed samples: 34160 | elapsed time per iteration (ms): 13602.2 | learning rate: 9.466E-06 | global batch size: 16 | lm loss: 7.078564E+00 | loss scale: 16384.0 | grad norm: 130110.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2136/ 159576 | consumed samples: 34176 | elapsed time per iteration (ms): 13592.0 | learning rate: 9.470E-06 | global batch size: 16 | lm loss: 6.814717E+00 | loss scale: 16384.0 | grad norm: 149407.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2137/ 159576 | consumed samples: 34192 | elapsed time per iteration (ms): 14082.9 | learning rate: 9.475E-06 | global batch size: 16 | lm loss: 6.978102E+00 | loss scale: 16384.0 | grad norm: 53919.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2138/ 159576 | consumed samples: 34208 | elapsed time per iteration (ms): 13782.2 | learning rate: 9.479E-06 | global batch size: 16 | lm loss: 6.799563E+00 | loss scale: 16384.0 | grad norm: 71961.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2139/ 159576 | consumed samples: 34224 | elapsed time per iteration (ms): 13617.0 | learning rate: 9.484E-06 | global batch size: 16 | lm loss: 6.855867E+00 | loss scale: 16384.0 | grad norm: 59818.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2140/ 159576 | consumed samples: 34240 | elapsed time per iteration (ms): 13639.2 | learning rate: 9.488E-06 | global batch size: 16 | lm loss: 6.902345E+00 | loss scale: 16384.0 | grad norm: 58890.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2141/ 159576 | consumed samples: 34256 | elapsed time per iteration (ms): 13987.1 | learning rate: 9.493E-06 | global batch size: 16 | lm loss: 6.755795E+00 | loss scale: 16384.0 | grad norm: 77002.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2142/ 159576 | consumed samples: 34272 | elapsed time per iteration (ms): 13630.0 | learning rate: 9.497E-06 | global batch size: 16 | lm loss: 6.875304E+00 | loss scale: 16384.0 | grad norm: 67923.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2143/ 159576 | consumed samples: 34288 | elapsed time per iteration (ms): 13550.6 | learning rate: 9.501E-06 | global batch size: 16 | lm loss: 6.950579E+00 | loss scale: 16384.0 | grad norm: 177721.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2144/ 159576 | consumed samples: 34304 | elapsed time per iteration (ms): 13618.0 | learning rate: 9.506E-06 | global batch size: 16 | lm loss: 6.968021E+00 | loss scale: 16384.0 | grad norm: 116784.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2145/ 159576 | consumed samples: 34320 | elapsed time per iteration (ms): 13676.0 | learning rate: 9.510E-06 | global batch size: 16 | lm loss: 6.878886E+00 | loss scale: 16384.0 | grad norm: 69612.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2146/ 159576 | consumed samples: 34336 | elapsed time per iteration (ms): 13771.3 | learning rate: 9.515E-06 | global batch size: 16 | lm loss: 6.903853E+00 | loss scale: 16384.0 | grad norm: 80623.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2147/ 159576 | consumed samples: 34352 | elapsed time per iteration (ms): 13687.5 | learning rate: 9.519E-06 | global batch size: 16 | lm loss: 6.992352E+00 | loss scale: 16384.0 | grad norm: 50990.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2148/ 159576 | consumed samples: 34368 | elapsed time per iteration (ms): 13681.5 | learning rate: 9.524E-06 | global batch size: 16 | lm loss: 6.979048E+00 | loss scale: 16384.0 | grad norm: 120685.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2149/ 159576 | consumed samples: 34384 | elapsed time per iteration (ms): 13585.6 | learning rate: 9.528E-06 | global batch size: 16 | lm loss: 6.962264E+00 | loss scale: 16384.0 | grad norm: 95096.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2150/ 159576 | consumed samples: 34400 | elapsed time per iteration (ms): 13964.4 | learning rate: 9.533E-06 | global batch size: 16 | lm loss: 7.070148E+00 | loss scale: 16384.0 | grad norm: 102834.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2151/ 159576 | consumed samples: 34416 | elapsed time per iteration (ms): 13597.2 | learning rate: 9.537E-06 | global batch size: 16 | lm loss: 6.998973E+00 | loss scale: 16384.0 | grad norm: 66036.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2152/ 159576 | consumed samples: 34432 | elapsed time per iteration (ms): 13608.8 | learning rate: 9.541E-06 | global batch size: 16 | lm loss: 6.972906E+00 | loss scale: 16384.0 | grad norm: 85292.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2153/ 159576 | consumed samples: 34448 | elapsed time per iteration (ms): 13623.2 | learning rate: 9.546E-06 | global batch size: 16 | lm loss: 6.755056E+00 | loss scale: 16384.0 | grad norm: 76762.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2154/ 159576 | consumed samples: 34464 | elapsed time per iteration (ms): 13956.2 | learning rate: 9.550E-06 | global batch size: 16 | lm loss: 7.015395E+00 | loss scale: 16384.0 | grad norm: 90062.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2155/ 159576 | consumed samples: 34480 | elapsed time per iteration (ms): 13759.1 | learning rate: 9.555E-06 | global batch size: 16 | lm loss: 6.815333E+00 | loss scale: 16384.0 | grad norm: 68441.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2156/ 159576 | consumed samples: 34496 | elapsed time per iteration (ms): 13580.0 | learning rate: 9.559E-06 | global batch size: 16 | lm loss: 6.783628E+00 | loss scale: 16384.0 | grad norm: 110716.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2157/ 159576 | consumed samples: 34512 | elapsed time per iteration (ms): 13582.3 | learning rate: 9.564E-06 | global batch size: 16 | lm loss: 7.064082E+00 | loss scale: 16384.0 | grad norm: 62285.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2158/ 159576 | consumed samples: 34528 | elapsed time per iteration (ms): 13596.2 | learning rate: 9.568E-06 | global batch size: 16 | lm loss: 7.092577E+00 | loss scale: 16384.0 | grad norm: 69925.096 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2159/ 159576 | consumed samples: 34544 | elapsed time per iteration (ms): 13966.6 | learning rate: 9.572E-06 | global batch size: 16 | lm loss: 7.030209E+00 | loss scale: 16384.0 | grad norm: 74908.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2160/ 159576 | consumed samples: 34560 | elapsed time per iteration (ms): 13608.2 | learning rate: 9.577E-06 | global batch size: 16 | lm loss: 6.985407E+00 | loss scale: 16384.0 | grad norm: 107105.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2161/ 159576 | consumed samples: 34576 | elapsed time per iteration (ms): 13591.8 | learning rate: 9.581E-06 | global batch size: 16 | lm loss: 6.846824E+00 | loss scale: 16384.0 | grad norm: 59511.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2162/ 159576 | consumed samples: 34592 | elapsed time per iteration (ms): 13686.7 | learning rate: 9.586E-06 | global batch size: 16 | lm loss: 6.984041E+00 | loss scale: 16384.0 | grad norm: 81334.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2163/ 159576 | consumed samples: 34608 | elapsed time per iteration (ms): 13937.5 | learning rate: 9.590E-06 | global batch size: 16 | lm loss: 7.022871E+00 | loss scale: 16384.0 | grad norm: 84185.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2164/ 159576 | consumed samples: 34624 | elapsed time per iteration (ms): 13577.7 | learning rate: 9.595E-06 | global batch size: 16 | lm loss: 7.029066E+00 | loss scale: 16384.0 | grad norm: 47624.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2165/ 159576 | consumed samples: 34640 | elapsed time per iteration (ms): 13595.6 | learning rate: 9.599E-06 | global batch size: 16 | lm loss: 6.822045E+00 | loss scale: 16384.0 | grad norm: 138589.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2166/ 159576 | consumed samples: 34656 | elapsed time per iteration (ms): 13704.6 | learning rate: 9.604E-06 | global batch size: 16 | lm loss: 6.980874E+00 | loss scale: 16384.0 | grad norm: 80500.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2167/ 159576 | consumed samples: 34672 | elapsed time per iteration (ms): 13517.8 | learning rate: 9.608E-06 | global batch size: 16 | lm loss: 7.052095E+00 | loss scale: 16384.0 | grad norm: 68630.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2168/ 159576 | consumed samples: 34688 | elapsed time per iteration (ms): 13832.6 | learning rate: 9.612E-06 | global batch size: 16 | lm loss: 7.172165E+00 | loss scale: 16384.0 | grad norm: 59001.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2169/ 159576 | consumed samples: 34704 | elapsed time per iteration (ms): 13681.3 | learning rate: 9.617E-06 | global batch size: 16 | lm loss: 7.068394E+00 | loss scale: 16384.0 | grad norm: 73598.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2170/ 159576 | consumed samples: 34720 | elapsed time per iteration (ms): 13669.0 | learning rate: 9.621E-06 | global batch size: 16 | lm loss: 6.842896E+00 | loss scale: 16384.0 | grad norm: 62440.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2171/ 159576 | consumed samples: 34736 | elapsed time per iteration (ms): 13648.5 | learning rate: 9.626E-06 | global batch size: 16 | lm loss: 7.126867E+00 | loss scale: 16384.0 | grad norm: 155364.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2172/ 159576 | consumed samples: 34752 | elapsed time per iteration (ms): 14078.1 | learning rate: 9.630E-06 | global batch size: 16 | lm loss: 7.047744E+00 | loss scale: 16384.0 | grad norm: 113473.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2173/ 159576 | consumed samples: 34768 | elapsed time per iteration (ms): 13680.5 | learning rate: 9.635E-06 | global batch size: 16 | lm loss: 7.016094E+00 | loss scale: 16384.0 | grad norm: 73489.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2174/ 159576 | consumed samples: 34784 | elapsed time per iteration (ms): 13666.0 | learning rate: 9.639E-06 | global batch size: 16 | lm loss: 7.061403E+00 | loss scale: 16384.0 | grad norm: 75521.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2175/ 159576 | consumed samples: 34800 | elapsed time per iteration (ms): 13610.4 | learning rate: 9.643E-06 | global batch size: 16 | lm loss: 7.042882E+00 | loss scale: 16384.0 | grad norm: 95300.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2176/ 159576 | consumed samples: 34816 | elapsed time per iteration (ms): 14108.9 | learning rate: 9.648E-06 | global batch size: 16 | lm loss: 6.915576E+00 | loss scale: 16384.0 | grad norm: 74751.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2177/ 159576 | consumed samples: 34832 | elapsed time per iteration (ms): 13643.1 | learning rate: 9.652E-06 | global batch size: 16 | lm loss: 6.979721E+00 | loss scale: 16384.0 | grad norm: 71252.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2178/ 159576 | consumed samples: 34848 | elapsed time per iteration (ms): 13642.9 | learning rate: 9.657E-06 | global batch size: 16 | lm loss: 6.816618E+00 | loss scale: 16384.0 | grad norm: 60039.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2179/ 159576 | consumed samples: 34864 | elapsed time per iteration (ms): 13628.9 | learning rate: 9.661E-06 | global batch size: 16 | lm loss: 7.054741E+00 | loss scale: 16384.0 | grad norm: 196305.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2180/ 159576 | consumed samples: 34880 | elapsed time per iteration (ms): 13588.5 | learning rate: 9.666E-06 | global batch size: 16 | lm loss: 6.953914E+00 | loss scale: 16384.0 | grad norm: 120715.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2181/ 159576 | consumed samples: 34896 | elapsed time per iteration (ms): 13968.3 | learning rate: 9.670E-06 | global batch size: 16 | lm loss: 7.034101E+00 | loss scale: 16384.0 | grad norm: 81756.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2182/ 159576 | consumed samples: 34912 | elapsed time per iteration (ms): 13658.7 | learning rate: 9.675E-06 | global batch size: 16 | lm loss: 6.787637E+00 | loss scale: 16384.0 | grad norm: 99431.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2183/ 159576 | consumed samples: 34928 | elapsed time per iteration (ms): 13669.1 | learning rate: 9.679E-06 | global batch size: 16 | lm loss: 6.894065E+00 | loss scale: 16384.0 | grad norm: 83400.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2184/ 159576 | consumed samples: 34944 | elapsed time per iteration (ms): 13649.9 | learning rate: 9.683E-06 | global batch size: 16 | lm loss: 6.871455E+00 | loss scale: 16384.0 | grad norm: 159204.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2185/ 159576 | consumed samples: 34960 | elapsed time per iteration (ms): 14059.0 | learning rate: 9.688E-06 | global batch size: 16 | lm loss: 6.954823E+00 | loss scale: 16384.0 | grad norm: 106187.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2186/ 159576 | consumed samples: 34976 | elapsed time per iteration (ms): 13651.8 | learning rate: 9.692E-06 | global batch size: 16 | lm loss: 7.198211E+00 | loss scale: 16384.0 | grad norm: 95306.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2187/ 159576 | consumed samples: 34992 | elapsed time per iteration (ms): 13612.8 | learning rate: 9.697E-06 | global batch size: 16 | lm loss: 7.037758E+00 | loss scale: 16384.0 | grad norm: 86743.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2188/ 159576 | consumed samples: 35008 | elapsed time per iteration (ms): 13616.1 | learning rate: 9.701E-06 | global batch size: 16 | lm loss: 6.780216E+00 | loss scale: 16384.0 | grad norm: 66759.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2189/ 159576 | consumed samples: 35024 | elapsed time per iteration (ms): 13935.4 | learning rate: 9.706E-06 | global batch size: 16 | lm loss: 7.134370E+00 | loss scale: 16384.0 | grad norm: 224387.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2190/ 159576 | consumed samples: 35040 | elapsed time per iteration (ms): 13796.3 | learning rate: 9.710E-06 | global batch size: 16 | lm loss: 6.830962E+00 | loss scale: 16384.0 | grad norm: 184503.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2191/ 159576 | consumed samples: 35056 | elapsed time per iteration (ms): 13596.6 | learning rate: 9.714E-06 | global batch size: 16 | lm loss: 7.006136E+00 | loss scale: 16384.0 | grad norm: 105791.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2192/ 159576 | consumed samples: 35072 | elapsed time per iteration (ms): 13632.0 | learning rate: 9.719E-06 | global batch size: 16 | lm loss: 7.023957E+00 | loss scale: 16384.0 | grad norm: 128317.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2193/ 159576 | consumed samples: 35088 | elapsed time per iteration (ms): 13700.7 | learning rate: 9.723E-06 | global batch size: 16 | lm loss: 6.920637E+00 | loss scale: 16384.0 | grad norm: 90884.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2194/ 159576 | consumed samples: 35104 | elapsed time per iteration (ms): 13995.7 | learning rate: 9.728E-06 | global batch size: 16 | lm loss: 7.240769E+00 | loss scale: 16384.0 | grad norm: 157352.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2195/ 159576 | consumed samples: 35120 | elapsed time per iteration (ms): 13669.4 | learning rate: 9.732E-06 | global batch size: 16 | lm loss: 6.780205E+00 | loss scale: 16384.0 | grad norm: 106455.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2196/ 159576 | consumed samples: 35136 | elapsed time per iteration (ms): 13670.0 | learning rate: 9.737E-06 | global batch size: 16 | lm loss: 6.778285E+00 | loss scale: 16384.0 | grad norm: 86879.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2197/ 159576 | consumed samples: 35152 | elapsed time per iteration (ms): 13661.3 | learning rate: 9.741E-06 | global batch size: 16 | lm loss: 7.030122E+00 | loss scale: 16384.0 | grad norm: 93377.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2198/ 159576 | consumed samples: 35168 | elapsed time per iteration (ms): 13923.4 | learning rate: 9.746E-06 | global batch size: 16 | lm loss: 6.727036E+00 | loss scale: 16384.0 | grad norm: 148918.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2199/ 159576 | consumed samples: 35184 | elapsed time per iteration (ms): 13675.4 | learning rate: 9.750E-06 | global batch size: 16 | lm loss: 7.104040E+00 | loss scale: 16384.0 | grad norm: 135532.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2200/ 159576 | consumed samples: 35200 | elapsed time per iteration (ms): 13739.5 | learning rate: 9.754E-06 | global batch size: 16 | lm loss: 6.969880E+00 | loss scale: 16384.0 | grad norm: 96195.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2201/ 159576 | consumed samples: 35216 | elapsed time per iteration (ms): 13703.1 | learning rate: 9.759E-06 | global batch size: 16 | lm loss: 7.123239E+00 | loss scale: 16384.0 | grad norm: 89259.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2202/ 159576 | consumed samples: 35232 | elapsed time per iteration (ms): 13665.4 | learning rate: 9.763E-06 | global batch size: 16 | lm loss: 6.652438E+00 | loss scale: 16384.0 | grad norm: 70165.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2203/ 159576 | consumed samples: 35248 | elapsed time per iteration (ms): 13954.1 | learning rate: 9.768E-06 | global batch size: 16 | lm loss: 6.943371E+00 | loss scale: 16384.0 | grad norm: 138696.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2204/ 159576 | consumed samples: 35264 | elapsed time per iteration (ms): 13604.7 | learning rate: 9.772E-06 | global batch size: 16 | lm loss: 6.743501E+00 | loss scale: 16384.0 | grad norm: 190526.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2205/ 159576 | consumed samples: 35280 | elapsed time per iteration (ms): 13626.5 | learning rate: 9.777E-06 | global batch size: 16 | lm loss: 6.968715E+00 | loss scale: 16384.0 | grad norm: 97137.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2206/ 159576 | consumed samples: 35296 | elapsed time per iteration (ms): 13767.5 | learning rate: 9.781E-06 | global batch size: 16 | lm loss: 6.911567E+00 | loss scale: 16384.0 | grad norm: 68778.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2207/ 159576 | consumed samples: 35312 | elapsed time per iteration (ms): 14159.2 | learning rate: 9.786E-06 | global batch size: 16 | lm loss: 7.117369E+00 | loss scale: 16384.0 | grad norm: 70066.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2208/ 159576 | consumed samples: 35328 | elapsed time per iteration (ms): 13832.5 | learning rate: 9.790E-06 | global batch size: 16 | lm loss: 7.121370E+00 | loss scale: 16384.0 | grad norm: 98891.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2209/ 159576 | consumed samples: 35344 | elapsed time per iteration (ms): 13749.3 | learning rate: 9.794E-06 | global batch size: 16 | lm loss: 6.873634E+00 | loss scale: 16384.0 | grad norm: 61060.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2210/ 159576 | consumed samples: 35360 | elapsed time per iteration (ms): 13710.7 | learning rate: 9.799E-06 | global batch size: 16 | lm loss: 6.761906E+00 | loss scale: 16384.0 | grad norm: 87340.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2211/ 159576 | consumed samples: 35376 | elapsed time per iteration (ms): 14073.4 | learning rate: 9.803E-06 | global batch size: 16 | lm loss: 6.896225E+00 | loss scale: 16384.0 | grad norm: 67623.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2212/ 159576 | consumed samples: 35392 | elapsed time per iteration (ms): 13676.6 | learning rate: 9.808E-06 | global batch size: 16 | lm loss: 6.925282E+00 | loss scale: 16384.0 | grad norm: 112986.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2213/ 159576 | consumed samples: 35408 | elapsed time per iteration (ms): 13682.0 | learning rate: 9.812E-06 | global batch size: 16 | lm loss: 6.932837E+00 | loss scale: 16384.0 | grad norm: 72538.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2214/ 159576 | consumed samples: 35424 | elapsed time per iteration (ms): 13773.0 | learning rate: 9.817E-06 | global batch size: 16 | lm loss: 6.751261E+00 | loss scale: 16384.0 | grad norm: 110253.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2215/ 159576 | consumed samples: 35440 | elapsed time per iteration (ms): 13688.8 | learning rate: 9.821E-06 | global batch size: 16 | lm loss: 6.953260E+00 | loss scale: 16384.0 | grad norm: 85951.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2216/ 159576 | consumed samples: 35456 | elapsed time per iteration (ms): 13877.0 | learning rate: 9.825E-06 | global batch size: 16 | lm loss: 6.963014E+00 | loss scale: 16384.0 | grad norm: 78883.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2217/ 159576 | consumed samples: 35472 | elapsed time per iteration (ms): 13727.8 | learning rate: 9.830E-06 | global batch size: 16 | lm loss: 6.840832E+00 | loss scale: 16384.0 | grad norm: 92435.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2218/ 159576 | consumed samples: 35488 | elapsed time per iteration (ms): 13750.4 | learning rate: 9.834E-06 | global batch size: 16 | lm loss: 6.949021E+00 | loss scale: 16384.0 | grad norm: 60313.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2219/ 159576 | consumed samples: 35504 | elapsed time per iteration (ms): 13607.8 | learning rate: 9.839E-06 | global batch size: 16 | lm loss: 6.950431E+00 | loss scale: 16384.0 | grad norm: 92434.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2220/ 159576 | consumed samples: 35520 | elapsed time per iteration (ms): 14159.9 | learning rate: 9.843E-06 | global batch size: 16 | lm loss: 7.318023E+00 | loss scale: 16384.0 | grad norm: 75178.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2221/ 159576 | consumed samples: 35536 | elapsed time per iteration (ms): 13828.1 | learning rate: 9.848E-06 | global batch size: 16 | lm loss: 6.425551E+00 | loss scale: 16384.0 | grad norm: 66904.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2222/ 159576 | consumed samples: 35552 | elapsed time per iteration (ms): 13669.2 | learning rate: 9.852E-06 | global batch size: 16 | lm loss: 7.016433E+00 | loss scale: 16384.0 | grad norm: 48549.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2223/ 159576 | consumed samples: 35568 | elapsed time per iteration (ms): 13705.5 | learning rate: 9.857E-06 | global batch size: 16 | lm loss: 7.026052E+00 | loss scale: 16384.0 | grad norm: 87253.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2224/ 159576 | consumed samples: 35584 | elapsed time per iteration (ms): 14141.1 | learning rate: 9.861E-06 | global batch size: 16 | lm loss: 7.019730E+00 | loss scale: 16384.0 | grad norm: 75100.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2225/ 159576 | consumed samples: 35600 | elapsed time per iteration (ms): 13696.3 | learning rate: 9.865E-06 | global batch size: 16 | lm loss: 6.750052E+00 | loss scale: 16384.0 | grad norm: 72544.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2226/ 159576 | consumed samples: 35616 | elapsed time per iteration (ms): 13659.8 | learning rate: 9.870E-06 | global batch size: 16 | lm loss: 6.815751E+00 | loss scale: 16384.0 | grad norm: 76403.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2227/ 159576 | consumed samples: 35632 | elapsed time per iteration (ms): 13696.5 | learning rate: 9.874E-06 | global batch size: 16 | lm loss: 6.716208E+00 | loss scale: 16384.0 | grad norm: 70565.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2228/ 159576 | consumed samples: 35648 | elapsed time per iteration (ms): 13652.7 | learning rate: 9.879E-06 | global batch size: 16 | lm loss: 6.902302E+00 | loss scale: 16384.0 | grad norm: 99921.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2229/ 159576 | consumed samples: 35664 | elapsed time per iteration (ms): 13754.5 | learning rate: 9.883E-06 | global batch size: 16 | lm loss: 6.941592E+00 | loss scale: 16384.0 | grad norm: 77045.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2230/ 159576 | consumed samples: 35680 | elapsed time per iteration (ms): 13726.8 | learning rate: 9.888E-06 | global batch size: 16 | lm loss: 7.006780E+00 | loss scale: 16384.0 | grad norm: 79594.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2231/ 159576 | consumed samples: 35696 | elapsed time per iteration (ms): 13704.0 | learning rate: 9.892E-06 | global batch size: 16 | lm loss: 7.056840E+00 | loss scale: 16384.0 | grad norm: 72251.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2232/ 159576 | consumed samples: 35712 | elapsed time per iteration (ms): 13646.8 | learning rate: 9.896E-06 | global batch size: 16 | lm loss: 6.913527E+00 | loss scale: 16384.0 | grad norm: 58442.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2233/ 159576 | consumed samples: 35728 | elapsed time per iteration (ms): 14009.0 | learning rate: 9.901E-06 | global batch size: 16 | lm loss: 6.865626E+00 | loss scale: 16384.0 | grad norm: 73447.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2234/ 159576 | consumed samples: 35744 | elapsed time per iteration (ms): 13550.7 | learning rate: 9.905E-06 | global batch size: 16 | lm loss: 6.954779E+00 | loss scale: 16384.0 | grad norm: 63007.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2235/ 159576 | consumed samples: 35760 | elapsed time per iteration (ms): 13638.3 | learning rate: 9.910E-06 | global batch size: 16 | lm loss: 6.917772E+00 | loss scale: 16384.0 | grad norm: 73029.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2236/ 159576 | consumed samples: 35776 | elapsed time per iteration (ms): 13495.6 | learning rate: 9.914E-06 | global batch size: 16 | lm loss: 6.899360E+00 | loss scale: 16384.0 | grad norm: 58524.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2237/ 159576 | consumed samples: 35792 | elapsed time per iteration (ms): 13933.0 | learning rate: 9.919E-06 | global batch size: 16 | lm loss: 6.898277E+00 | loss scale: 16384.0 | grad norm: 89250.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2238/ 159576 | consumed samples: 35808 | elapsed time per iteration (ms): 13906.4 | learning rate: 9.923E-06 | global batch size: 16 | lm loss: 6.863415E+00 | loss scale: 16384.0 | grad norm: 57965.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2239/ 159576 | consumed samples: 35824 | elapsed time per iteration (ms): 13638.8 | learning rate: 9.928E-06 | global batch size: 16 | lm loss: 6.994671E+00 | loss scale: 16384.0 | grad norm: 102232.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2240/ 159576 | consumed samples: 35840 | elapsed time per iteration (ms): 13621.9 | learning rate: 9.932E-06 | global batch size: 16 | lm loss: 6.956360E+00 | loss scale: 16384.0 | grad norm: 69904.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2241/ 159576 | consumed samples: 35856 | elapsed time per iteration (ms): 13633.2 | learning rate: 9.936E-06 | global batch size: 16 | lm loss: 6.939447E+00 | loss scale: 16384.0 | grad norm: 95578.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2242/ 159576 | consumed samples: 35872 | elapsed time per iteration (ms): 13726.4 | learning rate: 9.941E-06 | global batch size: 16 | lm loss: 7.046509E+00 | loss scale: 16384.0 | grad norm: 82383.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2243/ 159576 | consumed samples: 35888 | elapsed time per iteration (ms): 13506.7 | learning rate: 9.945E-06 | global batch size: 16 | lm loss: 7.151508E+00 | loss scale: 16384.0 | grad norm: 98476.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2244/ 159576 | consumed samples: 35904 | elapsed time per iteration (ms): 13568.6 | learning rate: 9.950E-06 | global batch size: 16 | lm loss: 6.872870E+00 | loss scale: 16384.0 | grad norm: 74912.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2245/ 159576 | consumed samples: 35920 | elapsed time per iteration (ms): 13602.7 | learning rate: 9.954E-06 | global batch size: 16 | lm loss: 6.673596E+00 | loss scale: 16384.0 | grad norm: 76531.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2246/ 159576 | consumed samples: 35936 | elapsed time per iteration (ms): 14093.3 | learning rate: 9.959E-06 | global batch size: 16 | lm loss: 6.910951E+00 | loss scale: 16384.0 | grad norm: 90155.766 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2247/ 159576 | consumed samples: 35952 | elapsed time per iteration (ms): 13495.1 | learning rate: 9.963E-06 | global batch size: 16 | lm loss: 6.761725E+00 | loss scale: 16384.0 | grad norm: 71637.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2248/ 159576 | consumed samples: 35968 | elapsed time per iteration (ms): 13629.2 | learning rate: 9.967E-06 | global batch size: 16 | lm loss: 6.898269E+00 | loss scale: 16384.0 | grad norm: 99310.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2249/ 159576 | consumed samples: 35984 | elapsed time per iteration (ms): 13535.5 | learning rate: 9.972E-06 | global batch size: 16 | lm loss: 6.917497E+00 | loss scale: 16384.0 | grad norm: 74932.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2250/ 159576 | consumed samples: 36000 | elapsed time per iteration (ms): 13554.8 | learning rate: 9.976E-06 | global batch size: 16 | lm loss: 6.728826E+00 | loss scale: 16384.0 | grad norm: 73535.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2251/ 159576 | consumed samples: 36016 | elapsed time per iteration (ms): 13742.7 | learning rate: 9.981E-06 | global batch size: 16 | lm loss: 6.901268E+00 | loss scale: 16384.0 | grad norm: 76822.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2252/ 159576 | consumed samples: 36032 | elapsed time per iteration (ms): 13586.6 | learning rate: 9.985E-06 | global batch size: 16 | lm loss: 6.964120E+00 | loss scale: 16384.0 | grad norm: 47563.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2253/ 159576 | consumed samples: 36048 | elapsed time per iteration (ms): 13621.0 | learning rate: 9.990E-06 | global batch size: 16 | lm loss: 6.976019E+00 | loss scale: 16384.0 | grad norm: 84584.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2254/ 159576 | consumed samples: 36064 | elapsed time per iteration (ms): 13682.5 | learning rate: 9.994E-06 | global batch size: 16 | lm loss: 6.875343E+00 | loss scale: 16384.0 | grad norm: 37745.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2255/ 159576 | consumed samples: 36080 | elapsed time per iteration (ms): 14145.6 | learning rate: 9.999E-06 | global batch size: 16 | lm loss: 6.934249E+00 | loss scale: 16384.0 | grad norm: 136584.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2256/ 159576 | consumed samples: 36096 | elapsed time per iteration (ms): 13651.1 | learning rate: 1.000E-05 | global batch size: 16 | lm loss: 6.785090E+00 | loss scale: 16384.0 | grad norm: 79752.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2257/ 159576 | consumed samples: 36112 | elapsed time per iteration (ms): 13492.4 | learning rate: 1.001E-05 | global batch size: 16 | lm loss: 6.860191E+00 | loss scale: 16384.0 | grad norm: 66550.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2258/ 159576 | consumed samples: 36128 | elapsed time per iteration (ms): 13560.5 | learning rate: 1.001E-05 | global batch size: 16 | lm loss: 6.910413E+00 | loss scale: 16384.0 | grad norm: 67569.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2259/ 159576 | consumed samples: 36144 | elapsed time per iteration (ms): 14039.9 | learning rate: 1.002E-05 | global batch size: 16 | lm loss: 7.188947E+00 | loss scale: 16384.0 | grad norm: 73452.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2260/ 159576 | consumed samples: 36160 | elapsed time per iteration (ms): 13575.5 | learning rate: 1.002E-05 | global batch size: 16 | lm loss: 6.873131E+00 | loss scale: 16384.0 | grad norm: 111867.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2261/ 159576 | consumed samples: 36176 | elapsed time per iteration (ms): 13638.2 | learning rate: 1.003E-05 | global batch size: 16 | lm loss: 6.838548E+00 | loss scale: 16384.0 | grad norm: 80423.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2262/ 159576 | consumed samples: 36192 | elapsed time per iteration (ms): 13658.9 | learning rate: 1.003E-05 | global batch size: 16 | lm loss: 7.019104E+00 | loss scale: 16384.0 | grad norm: 84663.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2263/ 159576 | consumed samples: 36208 | elapsed time per iteration (ms): 13616.1 | learning rate: 1.003E-05 | global batch size: 16 | lm loss: 6.917726E+00 | loss scale: 16384.0 | grad norm: 79078.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2264/ 159576 | consumed samples: 36224 | elapsed time per iteration (ms): 13773.7 | learning rate: 1.004E-05 | global batch size: 16 | lm loss: 7.129383E+00 | loss scale: 16384.0 | grad norm: 84356.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2265/ 159576 | consumed samples: 36240 | elapsed time per iteration (ms): 13599.9 | learning rate: 1.004E-05 | global batch size: 16 | lm loss: 6.950484E+00 | loss scale: 16384.0 | grad norm: 96317.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2266/ 159576 | consumed samples: 36256 | elapsed time per iteration (ms): 13555.3 | learning rate: 1.005E-05 | global batch size: 16 | lm loss: 6.983542E+00 | loss scale: 16384.0 | grad norm: 87963.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2267/ 159576 | consumed samples: 36272 | elapsed time per iteration (ms): 13615.4 | learning rate: 1.005E-05 | global batch size: 16 | lm loss: 7.106489E+00 | loss scale: 16384.0 | grad norm: 49938.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2268/ 159576 | consumed samples: 36288 | elapsed time per iteration (ms): 13987.6 | learning rate: 1.006E-05 | global batch size: 16 | lm loss: 6.957284E+00 | loss scale: 16384.0 | grad norm: 80083.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2269/ 159576 | consumed samples: 36304 | elapsed time per iteration (ms): 13613.8 | learning rate: 1.006E-05 | global batch size: 16 | lm loss: 6.895617E+00 | loss scale: 16384.0 | grad norm: 89537.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2270/ 159576 | consumed samples: 36320 | elapsed time per iteration (ms): 13747.0 | learning rate: 1.007E-05 | global batch size: 16 | lm loss: 6.945907E+00 | loss scale: 16384.0 | grad norm: 109400.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2271/ 159576 | consumed samples: 36336 | elapsed time per iteration (ms): 13527.2 | learning rate: 1.007E-05 | global batch size: 16 | lm loss: 6.928704E+00 | loss scale: 16384.0 | grad norm: 78576.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2272/ 159576 | consumed samples: 36352 | elapsed time per iteration (ms): 13615.1 | learning rate: 1.007E-05 | global batch size: 16 | lm loss: 7.229642E+00 | loss scale: 16384.0 | grad norm: 80535.103 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2273/ 159576 | consumed samples: 36368 | elapsed time per iteration (ms): 13960.2 | learning rate: 1.008E-05 | global batch size: 16 | lm loss: 6.896622E+00 | loss scale: 16384.0 | grad norm: 65043.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2274/ 159576 | consumed samples: 36384 | elapsed time per iteration (ms): 13538.8 | learning rate: 1.008E-05 | global batch size: 16 | lm loss: 7.013526E+00 | loss scale: 16384.0 | grad norm: 78284.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2275/ 159576 | consumed samples: 36400 | elapsed time per iteration (ms): 13634.5 | learning rate: 1.009E-05 | global batch size: 16 | lm loss: 6.912004E+00 | loss scale: 16384.0 | grad norm: 66988.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2276/ 159576 | consumed samples: 36416 | elapsed time per iteration (ms): 13609.6 | learning rate: 1.009E-05 | global batch size: 16 | lm loss: 6.759723E+00 | loss scale: 16384.0 | grad norm: 69630.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2277/ 159576 | consumed samples: 36432 | elapsed time per iteration (ms): 14096.5 | learning rate: 1.010E-05 | global batch size: 16 | lm loss: 7.025202E+00 | loss scale: 16384.0 | grad norm: 66059.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2278/ 159576 | consumed samples: 36448 | elapsed time per iteration (ms): 13743.0 | learning rate: 1.010E-05 | global batch size: 16 | lm loss: 6.957587E+00 | loss scale: 16384.0 | grad norm: 80177.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2279/ 159576 | consumed samples: 36464 | elapsed time per iteration (ms): 13675.0 | learning rate: 1.011E-05 | global batch size: 16 | lm loss: 6.897773E+00 | loss scale: 16384.0 | grad norm: 50160.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2280/ 159576 | consumed samples: 36480 | elapsed time per iteration (ms): 13581.6 | learning rate: 1.011E-05 | global batch size: 16 | lm loss: 6.697253E+00 | loss scale: 16384.0 | grad norm: 64483.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2281/ 159576 | consumed samples: 36496 | elapsed time per iteration (ms): 13961.5 | learning rate: 1.011E-05 | global batch size: 16 | lm loss: 6.944922E+00 | loss scale: 16384.0 | grad norm: 67869.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2282/ 159576 | consumed samples: 36512 | elapsed time per iteration (ms): 13505.0 | learning rate: 1.012E-05 | global batch size: 16 | lm loss: 6.410736E+00 | loss scale: 16384.0 | grad norm: 49766.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2283/ 159576 | consumed samples: 36528 | elapsed time per iteration (ms): 13611.4 | learning rate: 1.012E-05 | global batch size: 16 | lm loss: 6.772882E+00 | loss scale: 16384.0 | grad norm: 59961.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2284/ 159576 | consumed samples: 36544 | elapsed time per iteration (ms): 13596.5 | learning rate: 1.013E-05 | global batch size: 16 | lm loss: 6.794603E+00 | loss scale: 16384.0 | grad norm: 68562.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2285/ 159576 | consumed samples: 36560 | elapsed time per iteration (ms): 13567.2 | learning rate: 1.013E-05 | global batch size: 16 | lm loss: 7.113194E+00 | loss scale: 16384.0 | grad norm: 59728.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2286/ 159576 | consumed samples: 36576 | elapsed time per iteration (ms): 13847.6 | learning rate: 1.014E-05 | global batch size: 16 | lm loss: 6.799785E+00 | loss scale: 16384.0 | grad norm: 76247.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2287/ 159576 | consumed samples: 36592 | elapsed time per iteration (ms): 13611.9 | learning rate: 1.014E-05 | global batch size: 16 | lm loss: 7.034187E+00 | loss scale: 16384.0 | grad norm: 50151.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2288/ 159576 | consumed samples: 36608 | elapsed time per iteration (ms): 13533.2 | learning rate: 1.014E-05 | global batch size: 16 | lm loss: 6.881348E+00 | loss scale: 16384.0 | grad norm: 130377.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2289/ 159576 | consumed samples: 36624 | elapsed time per iteration (ms): 13525.7 | learning rate: 1.015E-05 | global batch size: 16 | lm loss: 6.952589E+00 | loss scale: 16384.0 | grad norm: 68434.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2290/ 159576 | consumed samples: 36640 | elapsed time per iteration (ms): 13963.1 | learning rate: 1.015E-05 | global batch size: 16 | lm loss: 6.887176E+00 | loss scale: 16384.0 | grad norm: 89636.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2291/ 159576 | consumed samples: 36656 | elapsed time per iteration (ms): 13620.5 | learning rate: 1.016E-05 | global batch size: 16 | lm loss: 6.846462E+00 | loss scale: 16384.0 | grad norm: 73199.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2292/ 159576 | consumed samples: 36672 | elapsed time per iteration (ms): 13656.0 | learning rate: 1.016E-05 | global batch size: 16 | lm loss: 7.302676E+00 | loss scale: 16384.0 | grad norm: 174677.987 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2293/ 159576 | consumed samples: 36688 | elapsed time per iteration (ms): 13714.2 | learning rate: 1.017E-05 | global batch size: 16 | lm loss: 7.151010E+00 | loss scale: 16384.0 | grad norm: 135612.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2294/ 159576 | consumed samples: 36704 | elapsed time per iteration (ms): 13919.9 | learning rate: 1.017E-05 | global batch size: 16 | lm loss: 7.005547E+00 | loss scale: 16384.0 | grad norm: 89084.825 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2295/ 159576 | consumed samples: 36720 | elapsed time per iteration (ms): 13650.1 | learning rate: 1.018E-05 | global batch size: 16 | lm loss: 6.588016E+00 | loss scale: 16384.0 | grad norm: 102875.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2296/ 159576 | consumed samples: 36736 | elapsed time per iteration (ms): 13574.9 | learning rate: 1.018E-05 | global batch size: 16 | lm loss: 6.896825E+00 | loss scale: 16384.0 | grad norm: 70940.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2297/ 159576 | consumed samples: 36752 | elapsed time per iteration (ms): 13573.3 | learning rate: 1.018E-05 | global batch size: 16 | lm loss: 6.883708E+00 | loss scale: 16384.0 | grad norm: 146744.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2298/ 159576 | consumed samples: 36768 | elapsed time per iteration (ms): 13649.6 | learning rate: 1.019E-05 | global batch size: 16 | lm loss: 7.139965E+00 | loss scale: 16384.0 | grad norm: 75816.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2299/ 159576 | consumed samples: 36784 | elapsed time per iteration (ms): 13959.1 | learning rate: 1.019E-05 | global batch size: 16 | lm loss: 6.811082E+00 | loss scale: 16384.0 | grad norm: 83246.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2300/ 159576 | consumed samples: 36800 | elapsed time per iteration (ms): 13736.9 | learning rate: 1.020E-05 | global batch size: 16 | lm loss: 6.719008E+00 | loss scale: 16384.0 | grad norm: 93595.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2301/ 159576 | consumed samples: 36816 | elapsed time per iteration (ms): 13666.3 | learning rate: 1.020E-05 | global batch size: 16 | lm loss: 7.039846E+00 | loss scale: 16384.0 | grad norm: 58298.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2302/ 159576 | consumed samples: 36832 | elapsed time per iteration (ms): 13631.9 | learning rate: 1.021E-05 | global batch size: 16 | lm loss: 6.796918E+00 | loss scale: 16384.0 | grad norm: 153620.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2303/ 159576 | consumed samples: 36848 | elapsed time per iteration (ms): 13914.1 | learning rate: 1.021E-05 | global batch size: 16 | lm loss: 7.011253E+00 | loss scale: 16384.0 | grad norm: 79116.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2304/ 159576 | consumed samples: 36864 | elapsed time per iteration (ms): 13578.2 | learning rate: 1.022E-05 | global batch size: 16 | lm loss: 6.786969E+00 | loss scale: 16384.0 | grad norm: 78214.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2305/ 159576 | consumed samples: 36880 | elapsed time per iteration (ms): 13606.1 | learning rate: 1.022E-05 | global batch size: 16 | lm loss: 6.896228E+00 | loss scale: 16384.0 | grad norm: 59758.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2306/ 159576 | consumed samples: 36896 | elapsed time per iteration (ms): 13630.5 | learning rate: 1.022E-05 | global batch size: 16 | lm loss: 6.715625E+00 | loss scale: 16384.0 | grad norm: 82018.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 11:06:17] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 11:06:17] PULSE: tr8-104B is running for 5:14:06 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 2307/ 159576 | consumed samples: 36912 | elapsed time per iteration (ms): 13695.2 | learning rate: 1.023E-05 | global batch size: 16 | lm loss: 6.898945E+00 | loss scale: 16384.0 | grad norm: 69074.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2308/ 159576 | consumed samples: 36928 | elapsed time per iteration (ms): 13864.3 | learning rate: 1.023E-05 | global batch size: 16 | lm loss: 6.896221E+00 | loss scale: 16384.0 | grad norm: 86879.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2309/ 159576 | consumed samples: 36944 | elapsed time per iteration (ms): 13567.7 | learning rate: 1.024E-05 | global batch size: 16 | lm loss: 6.747959E+00 | loss scale: 16384.0 | grad norm: 77379.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2310/ 159576 | consumed samples: 36960 | elapsed time per iteration (ms): 13717.6 | learning rate: 1.024E-05 | global batch size: 16 | lm loss: 6.945070E+00 | loss scale: 16384.0 | grad norm: 55236.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2311/ 159576 | consumed samples: 36976 | elapsed time per iteration (ms): 13519.2 | learning rate: 1.025E-05 | global batch size: 16 | lm loss: 7.033360E+00 | loss scale: 16384.0 | grad norm: 184283.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2312/ 159576 | consumed samples: 36992 | elapsed time per iteration (ms): 14030.2 | learning rate: 1.025E-05 | global batch size: 16 | lm loss: 7.147439E+00 | loss scale: 16384.0 | grad norm: 152407.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2313/ 159576 | consumed samples: 37008 | elapsed time per iteration (ms): 13685.4 | learning rate: 1.026E-05 | global batch size: 16 | lm loss: 6.739760E+00 | loss scale: 16384.0 | grad norm: 71801.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2314/ 159576 | consumed samples: 37024 | elapsed time per iteration (ms): 13648.0 | learning rate: 1.026E-05 | global batch size: 16 | lm loss: 6.839672E+00 | loss scale: 16384.0 | grad norm: 112304.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2315/ 159576 | consumed samples: 37040 | elapsed time per iteration (ms): 13683.0 | learning rate: 1.026E-05 | global batch size: 16 | lm loss: 6.987888E+00 | loss scale: 16384.0 | grad norm: 97383.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2316/ 159576 | consumed samples: 37056 | elapsed time per iteration (ms): 14019.7 | learning rate: 1.027E-05 | global batch size: 16 | lm loss: 6.766959E+00 | loss scale: 16384.0 | grad norm: 70142.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2317/ 159576 | consumed samples: 37072 | elapsed time per iteration (ms): 13698.7 | learning rate: 1.027E-05 | global batch size: 16 | lm loss: 7.002495E+00 | loss scale: 16384.0 | grad norm: 94556.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2318/ 159576 | consumed samples: 37088 | elapsed time per iteration (ms): 13548.8 | learning rate: 1.028E-05 | global batch size: 16 | lm loss: 6.785909E+00 | loss scale: 16384.0 | grad norm: 84852.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2319/ 159576 | consumed samples: 37104 | elapsed time per iteration (ms): 13558.1 | learning rate: 1.028E-05 | global batch size: 16 | lm loss: 6.969275E+00 | loss scale: 16384.0 | grad norm: 88628.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2320/ 159576 | consumed samples: 37120 | elapsed time per iteration (ms): 13584.6 | learning rate: 1.029E-05 | global batch size: 16 | lm loss: 6.991512E+00 | loss scale: 16384.0 | grad norm: 73561.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2321/ 159576 | consumed samples: 37136 | elapsed time per iteration (ms): 13808.4 | learning rate: 1.029E-05 | global batch size: 16 | lm loss: 6.689001E+00 | loss scale: 16384.0 | grad norm: 79235.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2322/ 159576 | consumed samples: 37152 | elapsed time per iteration (ms): 13660.8 | learning rate: 1.030E-05 | global batch size: 16 | lm loss: 6.829502E+00 | loss scale: 16384.0 | grad norm: 69229.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2323/ 159576 | consumed samples: 37168 | elapsed time per iteration (ms): 13667.4 | learning rate: 1.030E-05 | global batch size: 16 | lm loss: 6.532575E+00 | loss scale: 16384.0 | grad norm: 55927.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2324/ 159576 | consumed samples: 37184 | elapsed time per iteration (ms): 13703.5 | learning rate: 1.030E-05 | global batch size: 16 | lm loss: 6.922344E+00 | loss scale: 16384.0 | grad norm: 55395.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2325/ 159576 | consumed samples: 37200 | elapsed time per iteration (ms): 14028.0 | learning rate: 1.031E-05 | global batch size: 16 | lm loss: 6.827266E+00 | loss scale: 16384.0 | grad norm: 53256.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2326/ 159576 | consumed samples: 37216 | elapsed time per iteration (ms): 13463.4 | learning rate: 1.031E-05 | global batch size: 16 | lm loss: 6.792019E+00 | loss scale: 16384.0 | grad norm: 61740.952 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2327/ 159576 | consumed samples: 37232 | elapsed time per iteration (ms): 13567.6 | learning rate: 1.032E-05 | global batch size: 16 | lm loss: 6.871485E+00 | loss scale: 16384.0 | grad norm: 65916.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2328/ 159576 | consumed samples: 37248 | elapsed time per iteration (ms): 13610.6 | learning rate: 1.032E-05 | global batch size: 16 | lm loss: 6.773655E+00 | loss scale: 16384.0 | grad norm: 55451.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2329/ 159576 | consumed samples: 37264 | elapsed time per iteration (ms): 13843.3 | learning rate: 1.033E-05 | global batch size: 16 | lm loss: 6.881806E+00 | loss scale: 16384.0 | grad norm: 68242.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2330/ 159576 | consumed samples: 37280 | elapsed time per iteration (ms): 13903.0 | learning rate: 1.033E-05 | global batch size: 16 | lm loss: 6.769863E+00 | loss scale: 16384.0 | grad norm: 54395.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2331/ 159576 | consumed samples: 37296 | elapsed time per iteration (ms): 13689.8 | learning rate: 1.034E-05 | global batch size: 16 | lm loss: 6.915558E+00 | loss scale: 16384.0 | grad norm: 69787.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2332/ 159576 | consumed samples: 37312 | elapsed time per iteration (ms): 13584.4 | learning rate: 1.034E-05 | global batch size: 16 | lm loss: 6.872691E+00 | loss scale: 16384.0 | grad norm: 53158.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2333/ 159576 | consumed samples: 37328 | elapsed time per iteration (ms): 13510.8 | learning rate: 1.034E-05 | global batch size: 16 | lm loss: 6.772065E+00 | loss scale: 16384.0 | grad norm: 62866.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2334/ 159576 | consumed samples: 37344 | elapsed time per iteration (ms): 13981.1 | learning rate: 1.035E-05 | global batch size: 16 | lm loss: 6.889673E+00 | loss scale: 16384.0 | grad norm: 79595.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2335/ 159576 | consumed samples: 37360 | elapsed time per iteration (ms): 13567.6 | learning rate: 1.035E-05 | global batch size: 16 | lm loss: 6.996318E+00 | loss scale: 16384.0 | grad norm: 47255.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2336/ 159576 | consumed samples: 37376 | elapsed time per iteration (ms): 13643.5 | learning rate: 1.036E-05 | global batch size: 16 | lm loss: 6.824782E+00 | loss scale: 16384.0 | grad norm: 152401.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2337/ 159576 | consumed samples: 37392 | elapsed time per iteration (ms): 13630.4 | learning rate: 1.036E-05 | global batch size: 16 | lm loss: 6.711504E+00 | loss scale: 16384.0 | grad norm: 73188.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2338/ 159576 | consumed samples: 37408 | elapsed time per iteration (ms): 14043.0 | learning rate: 1.037E-05 | global batch size: 16 | lm loss: 6.830018E+00 | loss scale: 16384.0 | grad norm: 92791.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2339/ 159576 | consumed samples: 37424 | elapsed time per iteration (ms): 13758.4 | learning rate: 1.037E-05 | global batch size: 16 | lm loss: 7.017688E+00 | loss scale: 16384.0 | grad norm: 87062.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2340/ 159576 | consumed samples: 37440 | elapsed time per iteration (ms): 13518.0 | learning rate: 1.038E-05 | global batch size: 16 | lm loss: 6.749167E+00 | loss scale: 16384.0 | grad norm: 72774.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2341/ 159576 | consumed samples: 37456 | elapsed time per iteration (ms): 13582.6 | learning rate: 1.038E-05 | global batch size: 16 | lm loss: 7.188419E+00 | loss scale: 16384.0 | grad norm: 400324.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2342/ 159576 | consumed samples: 37472 | elapsed time per iteration (ms): 13646.9 | learning rate: 1.038E-05 | global batch size: 16 | lm loss: 7.124457E+00 | loss scale: 16384.0 | grad norm: 441674.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2343/ 159576 | consumed samples: 37488 | elapsed time per iteration (ms): 13721.9 | learning rate: 1.039E-05 | global batch size: 16 | lm loss: 6.941244E+00 | loss scale: 16384.0 | grad norm: 218702.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2344/ 159576 | consumed samples: 37504 | elapsed time per iteration (ms): 13653.7 | learning rate: 1.039E-05 | global batch size: 16 | lm loss: 6.768173E+00 | loss scale: 16384.0 | grad norm: 93071.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2345/ 159576 | consumed samples: 37520 | elapsed time per iteration (ms): 13684.4 | learning rate: 1.040E-05 | global batch size: 16 | lm loss: 6.862311E+00 | loss scale: 16384.0 | grad norm: 105985.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2346/ 159576 | consumed samples: 37536 | elapsed time per iteration (ms): 13732.9 | learning rate: 1.040E-05 | global batch size: 16 | lm loss: 7.097474E+00 | loss scale: 16384.0 | grad norm: 93646.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2347/ 159576 | consumed samples: 37552 | elapsed time per iteration (ms): 14087.6 | learning rate: 1.041E-05 | global batch size: 16 | lm loss: 6.949347E+00 | loss scale: 16384.0 | grad norm: 169536.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2348/ 159576 | consumed samples: 37568 | elapsed time per iteration (ms): 13603.2 | learning rate: 1.041E-05 | global batch size: 16 | lm loss: 6.839984E+00 | loss scale: 16384.0 | grad norm: 221068.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2349/ 159576 | consumed samples: 37584 | elapsed time per iteration (ms): 13602.7 | learning rate: 1.042E-05 | global batch size: 16 | lm loss: 6.722544E+00 | loss scale: 16384.0 | grad norm: 90138.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2350/ 159576 | consumed samples: 37600 | elapsed time per iteration (ms): 13600.0 | learning rate: 1.042E-05 | global batch size: 16 | lm loss: 6.765959E+00 | loss scale: 16384.0 | grad norm: 87849.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2351/ 159576 | consumed samples: 37616 | elapsed time per iteration (ms): 14049.9 | learning rate: 1.042E-05 | global batch size: 16 | lm loss: 7.058582E+00 | loss scale: 16384.0 | grad norm: 97203.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2352/ 159576 | consumed samples: 37632 | elapsed time per iteration (ms): 13664.4 | learning rate: 1.043E-05 | global batch size: 16 | lm loss: 6.709276E+00 | loss scale: 16384.0 | grad norm: 64321.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2353/ 159576 | consumed samples: 37648 | elapsed time per iteration (ms): 13697.2 | learning rate: 1.043E-05 | global batch size: 16 | lm loss: 6.963477E+00 | loss scale: 16384.0 | grad norm: 219491.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2354/ 159576 | consumed samples: 37664 | elapsed time per iteration (ms): 13647.8 | learning rate: 1.044E-05 | global batch size: 16 | lm loss: 6.986011E+00 | loss scale: 16384.0 | grad norm: 159710.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2355/ 159576 | consumed samples: 37680 | elapsed time per iteration (ms): 13594.7 | learning rate: 1.044E-05 | global batch size: 16 | lm loss: 6.833197E+00 | loss scale: 16384.0 | grad norm: 97227.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2356/ 159576 | consumed samples: 37696 | elapsed time per iteration (ms): 13840.6 | learning rate: 1.045E-05 | global batch size: 16 | lm loss: 7.008437E+00 | loss scale: 16384.0 | grad norm: 89122.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2357/ 159576 | consumed samples: 37712 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.045E-05 | global batch size: 16 | lm loss: 6.835823E+00 | loss scale: 16384.0 | grad norm: 77947.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2358/ 159576 | consumed samples: 37728 | elapsed time per iteration (ms): 13642.6 | learning rate: 1.046E-05 | global batch size: 16 | lm loss: 6.735652E+00 | loss scale: 16384.0 | grad norm: 162106.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2359/ 159576 | consumed samples: 37744 | elapsed time per iteration (ms): 13658.5 | learning rate: 1.046E-05 | global batch size: 16 | lm loss: 6.785017E+00 | loss scale: 16384.0 | grad norm: 128794.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2360/ 159576 | consumed samples: 37760 | elapsed time per iteration (ms): 14062.2 | learning rate: 1.046E-05 | global batch size: 16 | lm loss: 6.878942E+00 | loss scale: 16384.0 | grad norm: 101269.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2361/ 159576 | consumed samples: 37776 | elapsed time per iteration (ms): 13561.0 | learning rate: 1.047E-05 | global batch size: 16 | lm loss: 6.893463E+00 | loss scale: 16384.0 | grad norm: 78515.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2362/ 159576 | consumed samples: 37792 | elapsed time per iteration (ms): 13714.6 | learning rate: 1.047E-05 | global batch size: 16 | lm loss: 6.821845E+00 | loss scale: 16384.0 | grad norm: 78649.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2363/ 159576 | consumed samples: 37808 | elapsed time per iteration (ms): 13594.5 | learning rate: 1.048E-05 | global batch size: 16 | lm loss: 6.845947E+00 | loss scale: 16384.0 | grad norm: 158409.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2364/ 159576 | consumed samples: 37824 | elapsed time per iteration (ms): 13648.4 | learning rate: 1.048E-05 | global batch size: 16 | lm loss: 6.840971E+00 | loss scale: 16384.0 | grad norm: 88723.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2365/ 159576 | consumed samples: 37840 | elapsed time per iteration (ms): 13958.9 | learning rate: 1.049E-05 | global batch size: 16 | lm loss: 6.785653E+00 | loss scale: 16384.0 | grad norm: 106713.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2366/ 159576 | consumed samples: 37856 | elapsed time per iteration (ms): 13666.9 | learning rate: 1.049E-05 | global batch size: 16 | lm loss: 6.917600E+00 | loss scale: 16384.0 | grad norm: 90335.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2367/ 159576 | consumed samples: 37872 | elapsed time per iteration (ms): 13690.6 | learning rate: 1.050E-05 | global batch size: 16 | lm loss: 6.840955E+00 | loss scale: 16384.0 | grad norm: 63357.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2368/ 159576 | consumed samples: 37888 | elapsed time per iteration (ms): 13664.8 | learning rate: 1.050E-05 | global batch size: 16 | lm loss: 6.916069E+00 | loss scale: 16384.0 | grad norm: 107961.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2369/ 159576 | consumed samples: 37904 | elapsed time per iteration (ms): 14065.2 | learning rate: 1.050E-05 | global batch size: 16 | lm loss: 6.853414E+00 | loss scale: 16384.0 | grad norm: 84442.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2370/ 159576 | consumed samples: 37920 | elapsed time per iteration (ms): 13656.3 | learning rate: 1.051E-05 | global batch size: 16 | lm loss: 6.827930E+00 | loss scale: 16384.0 | grad norm: 62880.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2371/ 159576 | consumed samples: 37936 | elapsed time per iteration (ms): 13590.5 | learning rate: 1.051E-05 | global batch size: 16 | lm loss: 6.877656E+00 | loss scale: 16384.0 | grad norm: 75866.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2372/ 159576 | consumed samples: 37952 | elapsed time per iteration (ms): 13605.0 | learning rate: 1.052E-05 | global batch size: 16 | lm loss: 6.995963E+00 | loss scale: 16384.0 | grad norm: 71192.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2373/ 159576 | consumed samples: 37968 | elapsed time per iteration (ms): 13951.5 | learning rate: 1.052E-05 | global batch size: 16 | lm loss: 6.794531E+00 | loss scale: 16384.0 | grad norm: 64517.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2374/ 159576 | consumed samples: 37984 | elapsed time per iteration (ms): 13624.2 | learning rate: 1.053E-05 | global batch size: 16 | lm loss: 6.780855E+00 | loss scale: 16384.0 | grad norm: 83255.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2375/ 159576 | consumed samples: 38000 | elapsed time per iteration (ms): 13615.3 | learning rate: 1.053E-05 | global batch size: 16 | lm loss: 6.964709E+00 | loss scale: 16384.0 | grad norm: 79867.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2376/ 159576 | consumed samples: 38016 | elapsed time per iteration (ms): 13718.1 | learning rate: 1.054E-05 | global batch size: 16 | lm loss: 6.657259E+00 | loss scale: 16384.0 | grad norm: 60555.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2377/ 159576 | consumed samples: 38032 | elapsed time per iteration (ms): 13629.0 | learning rate: 1.054E-05 | global batch size: 16 | lm loss: 6.923594E+00 | loss scale: 16384.0 | grad norm: 52753.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2378/ 159576 | consumed samples: 38048 | elapsed time per iteration (ms): 13734.6 | learning rate: 1.054E-05 | global batch size: 16 | lm loss: 6.887539E+00 | loss scale: 16384.0 | grad norm: 103430.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2379/ 159576 | consumed samples: 38064 | elapsed time per iteration (ms): 13608.8 | learning rate: 1.055E-05 | global batch size: 16 | lm loss: 6.627044E+00 | loss scale: 16384.0 | grad norm: 73977.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2380/ 159576 | consumed samples: 38080 | elapsed time per iteration (ms): 13595.9 | learning rate: 1.055E-05 | global batch size: 16 | lm loss: 6.894679E+00 | loss scale: 16384.0 | grad norm: 66400.111 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2381/ 159576 | consumed samples: 38096 | elapsed time per iteration (ms): 13599.7 | learning rate: 1.056E-05 | global batch size: 16 | lm loss: 6.938529E+00 | loss scale: 16384.0 | grad norm: 70512.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2382/ 159576 | consumed samples: 38112 | elapsed time per iteration (ms): 14135.5 | learning rate: 1.056E-05 | global batch size: 16 | lm loss: 7.303653E+00 | loss scale: 16384.0 | grad norm: 79783.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2383/ 159576 | consumed samples: 38128 | elapsed time per iteration (ms): 13647.3 | learning rate: 1.057E-05 | global batch size: 16 | lm loss: 6.764983E+00 | loss scale: 16384.0 | grad norm: 74049.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2384/ 159576 | consumed samples: 38144 | elapsed time per iteration (ms): 13719.9 | learning rate: 1.057E-05 | global batch size: 16 | lm loss: 7.032783E+00 | loss scale: 16384.0 | grad norm: 66855.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2385/ 159576 | consumed samples: 38160 | elapsed time per iteration (ms): 13573.5 | learning rate: 1.058E-05 | global batch size: 16 | lm loss: 6.839710E+00 | loss scale: 16384.0 | grad norm: 58744.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2386/ 159576 | consumed samples: 38176 | elapsed time per iteration (ms): 14051.4 | learning rate: 1.058E-05 | global batch size: 16 | lm loss: 6.409803E+00 | loss scale: 16384.0 | grad norm: 54804.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2387/ 159576 | consumed samples: 38192 | elapsed time per iteration (ms): 13628.8 | learning rate: 1.058E-05 | global batch size: 16 | lm loss: 6.752995E+00 | loss scale: 16384.0 | grad norm: 57078.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2388/ 159576 | consumed samples: 38208 | elapsed time per iteration (ms): 13611.0 | learning rate: 1.059E-05 | global batch size: 16 | lm loss: 6.738320E+00 | loss scale: 16384.0 | grad norm: 45381.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2389/ 159576 | consumed samples: 38224 | elapsed time per iteration (ms): 13583.7 | learning rate: 1.059E-05 | global batch size: 16 | lm loss: 6.858883E+00 | loss scale: 16384.0 | grad norm: 86212.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2390/ 159576 | consumed samples: 38240 | elapsed time per iteration (ms): 13679.8 | learning rate: 1.060E-05 | global batch size: 16 | lm loss: 7.024375E+00 | loss scale: 16384.0 | grad norm: 66322.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2391/ 159576 | consumed samples: 38256 | elapsed time per iteration (ms): 13997.0 | learning rate: 1.060E-05 | global batch size: 16 | lm loss: 6.983364E+00 | loss scale: 16384.0 | grad norm: 84730.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2392/ 159576 | consumed samples: 38272 | elapsed time per iteration (ms): 13673.8 | learning rate: 1.061E-05 | global batch size: 16 | lm loss: 6.900928E+00 | loss scale: 16384.0 | grad norm: 52849.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2393/ 159576 | consumed samples: 38288 | elapsed time per iteration (ms): 13615.2 | learning rate: 1.061E-05 | global batch size: 16 | lm loss: 6.866693E+00 | loss scale: 16384.0 | grad norm: 87208.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2394/ 159576 | consumed samples: 38304 | elapsed time per iteration (ms): 13615.9 | learning rate: 1.062E-05 | global batch size: 16 | lm loss: 6.702727E+00 | loss scale: 16384.0 | grad norm: 69928.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2395/ 159576 | consumed samples: 38320 | elapsed time per iteration (ms): 14056.6 | learning rate: 1.062E-05 | global batch size: 16 | lm loss: 6.909261E+00 | loss scale: 16384.0 | grad norm: 122690.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2396/ 159576 | consumed samples: 38336 | elapsed time per iteration (ms): 13483.1 | learning rate: 1.062E-05 | global batch size: 16 | lm loss: 6.938586E+00 | loss scale: 16384.0 | grad norm: 80283.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2397/ 159576 | consumed samples: 38352 | elapsed time per iteration (ms): 13678.0 | learning rate: 1.063E-05 | global batch size: 16 | lm loss: 6.916673E+00 | loss scale: 16384.0 | grad norm: 78417.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2398/ 159576 | consumed samples: 38368 | elapsed time per iteration (ms): 13713.3 | learning rate: 1.063E-05 | global batch size: 16 | lm loss: 6.894761E+00 | loss scale: 16384.0 | grad norm: 79613.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2399/ 159576 | consumed samples: 38384 | elapsed time per iteration (ms): 13844.0 | learning rate: 1.064E-05 | global batch size: 16 | lm loss: 6.895288E+00 | loss scale: 16384.0 | grad norm: 117360.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2400/ 159576 | consumed samples: 38400 | elapsed time per iteration (ms): 13869.8 | learning rate: 1.064E-05 | global batch size: 16 | lm loss: 7.002610E+00 | loss scale: 16384.0 | grad norm: 98958.976 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2401/ 159576 | consumed samples: 38416 | elapsed time per iteration (ms): 13601.8 | learning rate: 1.065E-05 | global batch size: 16 | lm loss: 6.744779E+00 | loss scale: 16384.0 | grad norm: 75497.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2402/ 159576 | consumed samples: 38432 | elapsed time per iteration (ms): 13599.2 | learning rate: 1.065E-05 | global batch size: 16 | lm loss: 7.107717E+00 | loss scale: 16384.0 | grad norm: 78343.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2403/ 159576 | consumed samples: 38448 | elapsed time per iteration (ms): 13623.1 | learning rate: 1.066E-05 | global batch size: 16 | lm loss: 6.897991E+00 | loss scale: 16384.0 | grad norm: 89054.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2404/ 159576 | consumed samples: 38464 | elapsed time per iteration (ms): 14088.2 | learning rate: 1.066E-05 | global batch size: 16 | lm loss: 6.915084E+00 | loss scale: 16384.0 | grad norm: 88153.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2405/ 159576 | consumed samples: 38480 | elapsed time per iteration (ms): 13711.7 | learning rate: 1.066E-05 | global batch size: 16 | lm loss: 6.791551E+00 | loss scale: 16384.0 | grad norm: 81047.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2406/ 159576 | consumed samples: 38496 | elapsed time per iteration (ms): 13659.9 | learning rate: 1.067E-05 | global batch size: 16 | lm loss: 6.768214E+00 | loss scale: 16384.0 | grad norm: 63942.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2407/ 159576 | consumed samples: 38512 | elapsed time per iteration (ms): 13659.5 | learning rate: 1.067E-05 | global batch size: 16 | lm loss: 6.785830E+00 | loss scale: 16384.0 | grad norm: 50544.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2408/ 159576 | consumed samples: 38528 | elapsed time per iteration (ms): 14010.2 | learning rate: 1.068E-05 | global batch size: 16 | lm loss: 6.781000E+00 | loss scale: 16384.0 | grad norm: 114170.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2409/ 159576 | consumed samples: 38544 | elapsed time per iteration (ms): 13587.7 | learning rate: 1.068E-05 | global batch size: 16 | lm loss: 6.876911E+00 | loss scale: 16384.0 | grad norm: 60235.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2410/ 159576 | consumed samples: 38560 | elapsed time per iteration (ms): 13605.6 | learning rate: 1.069E-05 | global batch size: 16 | lm loss: 6.837091E+00 | loss scale: 16384.0 | grad norm: 72387.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2411/ 159576 | consumed samples: 38576 | elapsed time per iteration (ms): 13675.7 | learning rate: 1.069E-05 | global batch size: 16 | lm loss: 6.912636E+00 | loss scale: 16384.0 | grad norm: 76432.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2412/ 159576 | consumed samples: 38592 | elapsed time per iteration (ms): 13569.6 | learning rate: 1.070E-05 | global batch size: 16 | lm loss: 6.712539E+00 | loss scale: 16384.0 | grad norm: 113832.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2413/ 159576 | consumed samples: 38608 | elapsed time per iteration (ms): 13932.9 | learning rate: 1.070E-05 | global batch size: 16 | lm loss: 6.804219E+00 | loss scale: 16384.0 | grad norm: 73073.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2414/ 159576 | consumed samples: 38624 | elapsed time per iteration (ms): 13742.1 | learning rate: 1.070E-05 | global batch size: 16 | lm loss: 6.947999E+00 | loss scale: 16384.0 | grad norm: 90599.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2415/ 159576 | consumed samples: 38640 | elapsed time per iteration (ms): 13556.3 | learning rate: 1.071E-05 | global batch size: 16 | lm loss: 7.002557E+00 | loss scale: 16384.0 | grad norm: 71840.830 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2416/ 159576 | consumed samples: 38656 | elapsed time per iteration (ms): 13593.5 | learning rate: 1.071E-05 | global batch size: 16 | lm loss: 6.920745E+00 | loss scale: 16384.0 | grad norm: 60284.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2417/ 159576 | consumed samples: 38672 | elapsed time per iteration (ms): 14084.6 | learning rate: 1.072E-05 | global batch size: 16 | lm loss: 7.137000E+00 | loss scale: 16384.0 | grad norm: 185539.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2418/ 159576 | consumed samples: 38688 | elapsed time per iteration (ms): 13641.5 | learning rate: 1.072E-05 | global batch size: 16 | lm loss: 6.757603E+00 | loss scale: 16384.0 | grad norm: 127319.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2419/ 159576 | consumed samples: 38704 | elapsed time per iteration (ms): 13580.1 | learning rate: 1.073E-05 | global batch size: 16 | lm loss: 6.869411E+00 | loss scale: 16384.0 | grad norm: 97709.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2420/ 159576 | consumed samples: 38720 | elapsed time per iteration (ms): 13629.2 | learning rate: 1.073E-05 | global batch size: 16 | lm loss: 6.709553E+00 | loss scale: 16384.0 | grad norm: 92144.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2421/ 159576 | consumed samples: 38736 | elapsed time per iteration (ms): 14151.6 | learning rate: 1.074E-05 | global batch size: 16 | lm loss: 6.884684E+00 | loss scale: 16384.0 | grad norm: 68698.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2422/ 159576 | consumed samples: 38752 | elapsed time per iteration (ms): 13613.5 | learning rate: 1.074E-05 | global batch size: 16 | lm loss: 6.869916E+00 | loss scale: 16384.0 | grad norm: 183504.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2423/ 159576 | consumed samples: 38768 | elapsed time per iteration (ms): 13633.7 | learning rate: 1.074E-05 | global batch size: 16 | lm loss: 6.890718E+00 | loss scale: 16384.0 | grad norm: 156548.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2424/ 159576 | consumed samples: 38784 | elapsed time per iteration (ms): 13607.9 | learning rate: 1.075E-05 | global batch size: 16 | lm loss: 6.935307E+00 | loss scale: 16384.0 | grad norm: 64330.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2425/ 159576 | consumed samples: 38800 | elapsed time per iteration (ms): 13605.4 | learning rate: 1.075E-05 | global batch size: 16 | lm loss: 6.766086E+00 | loss scale: 16384.0 | grad norm: 69465.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2426/ 159576 | consumed samples: 38816 | elapsed time per iteration (ms): 13928.6 | learning rate: 1.076E-05 | global batch size: 16 | lm loss: 7.066947E+00 | loss scale: 16384.0 | grad norm: 107634.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2427/ 159576 | consumed samples: 38832 | elapsed time per iteration (ms): 13650.1 | learning rate: 1.076E-05 | global batch size: 16 | lm loss: 7.050639E+00 | loss scale: 16384.0 | grad norm: 95342.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2428/ 159576 | consumed samples: 38848 | elapsed time per iteration (ms): 13681.2 | learning rate: 1.077E-05 | global batch size: 16 | lm loss: 6.855616E+00 | loss scale: 16384.0 | grad norm: 59595.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2429/ 159576 | consumed samples: 38864 | elapsed time per iteration (ms): 13695.9 | learning rate: 1.077E-05 | global batch size: 16 | lm loss: 7.041804E+00 | loss scale: 16384.0 | grad norm: 65131.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2430/ 159576 | consumed samples: 38880 | elapsed time per iteration (ms): 13962.7 | learning rate: 1.078E-05 | global batch size: 16 | lm loss: 6.803939E+00 | loss scale: 16384.0 | grad norm: 63269.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2431/ 159576 | consumed samples: 38896 | elapsed time per iteration (ms): 13583.2 | learning rate: 1.078E-05 | global batch size: 16 | lm loss: 6.876345E+00 | loss scale: 16384.0 | grad norm: 74949.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2432/ 159576 | consumed samples: 38912 | elapsed time per iteration (ms): 13606.6 | learning rate: 1.078E-05 | global batch size: 16 | lm loss: 6.916327E+00 | loss scale: 16384.0 | grad norm: 74586.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2433/ 159576 | consumed samples: 38928 | elapsed time per iteration (ms): 13607.5 | learning rate: 1.079E-05 | global batch size: 16 | lm loss: 6.779680E+00 | loss scale: 16384.0 | grad norm: 82519.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2434/ 159576 | consumed samples: 38944 | elapsed time per iteration (ms): 13894.0 | learning rate: 1.079E-05 | global batch size: 16 | lm loss: 6.903611E+00 | loss scale: 16384.0 | grad norm: 69004.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2435/ 159576 | consumed samples: 38960 | elapsed time per iteration (ms): 13779.1 | learning rate: 1.080E-05 | global batch size: 16 | lm loss: 6.630243E+00 | loss scale: 16384.0 | grad norm: 107197.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2436/ 159576 | consumed samples: 38976 | elapsed time per iteration (ms): 13659.0 | learning rate: 1.080E-05 | global batch size: 16 | lm loss: 6.876919E+00 | loss scale: 16384.0 | grad norm: 77407.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2437/ 159576 | consumed samples: 38992 | elapsed time per iteration (ms): 13553.5 | learning rate: 1.081E-05 | global batch size: 16 | lm loss: 6.728307E+00 | loss scale: 16384.0 | grad norm: 79645.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2438/ 159576 | consumed samples: 39008 | elapsed time per iteration (ms): 13664.0 | learning rate: 1.081E-05 | global batch size: 16 | lm loss: 6.923852E+00 | loss scale: 16384.0 | grad norm: 70221.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2439/ 159576 | consumed samples: 39024 | elapsed time per iteration (ms): 13814.4 | learning rate: 1.082E-05 | global batch size: 16 | lm loss: 6.729681E+00 | loss scale: 16384.0 | grad norm: 71734.084 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2440/ 159576 | consumed samples: 39040 | elapsed time per iteration (ms): 13667.6 | learning rate: 1.082E-05 | global batch size: 16 | lm loss: 6.668837E+00 | loss scale: 16384.0 | grad norm: 69995.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2441/ 159576 | consumed samples: 39056 | elapsed time per iteration (ms): 13617.8 | learning rate: 1.082E-05 | global batch size: 16 | lm loss: 6.781438E+00 | loss scale: 16384.0 | grad norm: 49304.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2442/ 159576 | consumed samples: 39072 | elapsed time per iteration (ms): 13652.0 | learning rate: 1.083E-05 | global batch size: 16 | lm loss: 6.810652E+00 | loss scale: 16384.0 | grad norm: 86564.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2443/ 159576 | consumed samples: 39088 | elapsed time per iteration (ms): 14063.1 | learning rate: 1.083E-05 | global batch size: 16 | lm loss: 6.879047E+00 | loss scale: 16384.0 | grad norm: 56659.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2444/ 159576 | consumed samples: 39104 | elapsed time per iteration (ms): 13586.9 | learning rate: 1.084E-05 | global batch size: 16 | lm loss: 6.494076E+00 | loss scale: 16384.0 | grad norm: 72585.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2445/ 159576 | consumed samples: 39120 | elapsed time per iteration (ms): 13676.6 | learning rate: 1.084E-05 | global batch size: 16 | lm loss: 6.713490E+00 | loss scale: 16384.0 | grad norm: 68348.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2446/ 159576 | consumed samples: 39136 | elapsed time per iteration (ms): 13706.8 | learning rate: 1.085E-05 | global batch size: 16 | lm loss: 6.970970E+00 | loss scale: 16384.0 | grad norm: 145461.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2447/ 159576 | consumed samples: 39152 | elapsed time per iteration (ms): 13581.7 | learning rate: 1.085E-05 | global batch size: 16 | lm loss: 6.777845E+00 | loss scale: 16384.0 | grad norm: 67935.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2448/ 159576 | consumed samples: 39168 | elapsed time per iteration (ms): 13810.2 | learning rate: 1.086E-05 | global batch size: 16 | lm loss: 6.772415E+00 | loss scale: 16384.0 | grad norm: 86835.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2449/ 159576 | consumed samples: 39184 | elapsed time per iteration (ms): 13641.6 | learning rate: 1.086E-05 | global batch size: 16 | lm loss: 6.901608E+00 | loss scale: 16384.0 | grad norm: 86381.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2450/ 159576 | consumed samples: 39200 | elapsed time per iteration (ms): 13577.4 | learning rate: 1.086E-05 | global batch size: 16 | lm loss: 6.923601E+00 | loss scale: 16384.0 | grad norm: 67065.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2451/ 159576 | consumed samples: 39216 | elapsed time per iteration (ms): 13656.8 | learning rate: 1.087E-05 | global batch size: 16 | lm loss: 6.635858E+00 | loss scale: 16384.0 | grad norm: 118766.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2452/ 159576 | consumed samples: 39232 | elapsed time per iteration (ms): 14182.2 | learning rate: 1.087E-05 | global batch size: 16 | lm loss: 6.798747E+00 | loss scale: 16384.0 | grad norm: 86778.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2453/ 159576 | consumed samples: 39248 | elapsed time per iteration (ms): 13794.7 | learning rate: 1.088E-05 | global batch size: 16 | lm loss: 6.934669E+00 | loss scale: 16384.0 | grad norm: 72867.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2454/ 159576 | consumed samples: 39264 | elapsed time per iteration (ms): 13649.1 | learning rate: 1.088E-05 | global batch size: 16 | lm loss: 6.689157E+00 | loss scale: 16384.0 | grad norm: 53809.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2455/ 159576 | consumed samples: 39280 | elapsed time per iteration (ms): 13619.0 | learning rate: 1.089E-05 | global batch size: 16 | lm loss: 6.797565E+00 | loss scale: 16384.0 | grad norm: 130277.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2456/ 159576 | consumed samples: 39296 | elapsed time per iteration (ms): 14036.7 | learning rate: 1.089E-05 | global batch size: 16 | lm loss: 6.919378E+00 | loss scale: 16384.0 | grad norm: 68731.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2457/ 159576 | consumed samples: 39312 | elapsed time per iteration (ms): 13656.3 | learning rate: 1.089E-05 | global batch size: 16 | lm loss: 6.658165E+00 | loss scale: 16384.0 | grad norm: 90782.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2458/ 159576 | consumed samples: 39328 | elapsed time per iteration (ms): 13635.5 | learning rate: 1.090E-05 | global batch size: 16 | lm loss: 6.614546E+00 | loss scale: 16384.0 | grad norm: 80319.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2459/ 159576 | consumed samples: 39344 | elapsed time per iteration (ms): 13648.3 | learning rate: 1.090E-05 | global batch size: 16 | lm loss: 6.813863E+00 | loss scale: 16384.0 | grad norm: 96291.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2460/ 159576 | consumed samples: 39360 | elapsed time per iteration (ms): 13655.8 | learning rate: 1.091E-05 | global batch size: 16 | lm loss: 7.162710E+00 | loss scale: 16384.0 | grad norm: 58863.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2461/ 159576 | consumed samples: 39376 | elapsed time per iteration (ms): 13960.2 | learning rate: 1.091E-05 | global batch size: 16 | lm loss: 6.991768E+00 | loss scale: 16384.0 | grad norm: 72538.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2462/ 159576 | consumed samples: 39392 | elapsed time per iteration (ms): 13649.7 | learning rate: 1.092E-05 | global batch size: 16 | lm loss: 6.712080E+00 | loss scale: 16384.0 | grad norm: 76061.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2463/ 159576 | consumed samples: 39408 | elapsed time per iteration (ms): 13665.9 | learning rate: 1.092E-05 | global batch size: 16 | lm loss: 6.697587E+00 | loss scale: 16384.0 | grad norm: 78444.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2464/ 159576 | consumed samples: 39424 | elapsed time per iteration (ms): 13548.3 | learning rate: 1.093E-05 | global batch size: 16 | lm loss: 6.767040E+00 | loss scale: 16384.0 | grad norm: 71114.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2465/ 159576 | consumed samples: 39440 | elapsed time per iteration (ms): 13972.6 | learning rate: 1.093E-05 | global batch size: 16 | lm loss: 6.750882E+00 | loss scale: 16384.0 | grad norm: 60498.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2466/ 159576 | consumed samples: 39456 | elapsed time per iteration (ms): 13657.9 | learning rate: 1.093E-05 | global batch size: 16 | lm loss: 6.631062E+00 | loss scale: 16384.0 | grad norm: 75019.075 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2467/ 159576 | consumed samples: 39472 | elapsed time per iteration (ms): 13692.3 | learning rate: 1.094E-05 | global batch size: 16 | lm loss: 6.725332E+00 | loss scale: 16384.0 | grad norm: 53922.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2468/ 159576 | consumed samples: 39488 | elapsed time per iteration (ms): 13656.1 | learning rate: 1.094E-05 | global batch size: 16 | lm loss: 6.736504E+00 | loss scale: 16384.0 | grad norm: 54250.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2469/ 159576 | consumed samples: 39504 | elapsed time per iteration (ms): 14009.1 | learning rate: 1.095E-05 | global batch size: 16 | lm loss: 6.881338E+00 | loss scale: 16384.0 | grad norm: 64641.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2470/ 159576 | consumed samples: 39520 | elapsed time per iteration (ms): 13853.1 | learning rate: 1.095E-05 | global batch size: 16 | lm loss: 6.742140E+00 | loss scale: 16384.0 | grad norm: 52195.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2471/ 159576 | consumed samples: 39536 | elapsed time per iteration (ms): 13541.2 | learning rate: 1.096E-05 | global batch size: 16 | lm loss: 6.830609E+00 | loss scale: 16384.0 | grad norm: 98883.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2472/ 159576 | consumed samples: 39552 | elapsed time per iteration (ms): 13618.7 | learning rate: 1.096E-05 | global batch size: 16 | lm loss: 6.770423E+00 | loss scale: 16384.0 | grad norm: 66896.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2473/ 159576 | consumed samples: 39568 | elapsed time per iteration (ms): 13623.5 | learning rate: 1.097E-05 | global batch size: 16 | lm loss: 6.926878E+00 | loss scale: 16384.0 | grad norm: 74406.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2474/ 159576 | consumed samples: 39584 | elapsed time per iteration (ms): 14089.9 | learning rate: 1.097E-05 | global batch size: 16 | lm loss: 6.834147E+00 | loss scale: 16384.0 | grad norm: 61442.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2475/ 159576 | consumed samples: 39600 | elapsed time per iteration (ms): 13713.9 | learning rate: 1.097E-05 | global batch size: 16 | lm loss: 6.711390E+00 | loss scale: 16384.0 | grad norm: 72993.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2476/ 159576 | consumed samples: 39616 | elapsed time per iteration (ms): 13666.0 | learning rate: 1.098E-05 | global batch size: 16 | lm loss: 6.715760E+00 | loss scale: 16384.0 | grad norm: 54753.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2477/ 159576 | consumed samples: 39632 | elapsed time per iteration (ms): 13628.3 | learning rate: 1.098E-05 | global batch size: 16 | lm loss: 7.034068E+00 | loss scale: 16384.0 | grad norm: 65362.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2478/ 159576 | consumed samples: 39648 | elapsed time per iteration (ms): 14016.3 | learning rate: 1.099E-05 | global batch size: 16 | lm loss: 6.848239E+00 | loss scale: 16384.0 | grad norm: 59886.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2479/ 159576 | consumed samples: 39664 | elapsed time per iteration (ms): 13518.2 | learning rate: 1.099E-05 | global batch size: 16 | lm loss: 6.766425E+00 | loss scale: 32768.0 | grad norm: 47600.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2480/ 159576 | consumed samples: 39680 | elapsed time per iteration (ms): 13611.4 | learning rate: 1.100E-05 | global batch size: 16 | lm loss: 6.569361E+00 | loss scale: 32768.0 | grad norm: 173183.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2481/ 159576 | consumed samples: 39696 | elapsed time per iteration (ms): 13649.6 | learning rate: 1.100E-05 | global batch size: 16 | lm loss: 6.977244E+00 | loss scale: 32768.0 | grad norm: 114608.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2482/ 159576 | consumed samples: 39712 | elapsed time per iteration (ms): 13592.7 | learning rate: 1.101E-05 | global batch size: 16 | lm loss: 6.743002E+00 | loss scale: 32768.0 | grad norm: 157122.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2483/ 159576 | consumed samples: 39728 | elapsed time per iteration (ms): 13957.3 | learning rate: 1.101E-05 | global batch size: 16 | lm loss: 6.786878E+00 | loss scale: 32768.0 | grad norm: 124608.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2484/ 159576 | consumed samples: 39744 | elapsed time per iteration (ms): 13654.6 | learning rate: 1.101E-05 | global batch size: 16 | lm loss: 6.859965E+00 | loss scale: 32768.0 | grad norm: 232222.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2485/ 159576 | consumed samples: 39760 | elapsed time per iteration (ms): 13613.9 | learning rate: 1.102E-05 | global batch size: 16 | lm loss: 6.802356E+00 | loss scale: 32768.0 | grad norm: 156829.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2486/ 159576 | consumed samples: 39776 | elapsed time per iteration (ms): 13653.4 | learning rate: 1.102E-05 | global batch size: 16 | lm loss: 6.710648E+00 | loss scale: 32768.0 | grad norm: 134523.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2487/ 159576 | consumed samples: 39792 | elapsed time per iteration (ms): 14072.7 | learning rate: 1.103E-05 | global batch size: 16 | lm loss: 6.797608E+00 | loss scale: 32768.0 | grad norm: 125011.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2488/ 159576 | consumed samples: 39808 | elapsed time per iteration (ms): 13639.9 | learning rate: 1.103E-05 | global batch size: 16 | lm loss: 6.854223E+00 | loss scale: 32768.0 | grad norm: 260551.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2489/ 159576 | consumed samples: 39824 | elapsed time per iteration (ms): 13577.6 | learning rate: 1.104E-05 | global batch size: 16 | lm loss: 6.603992E+00 | loss scale: 32768.0 | grad norm: 181893.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2490/ 159576 | consumed samples: 39840 | elapsed time per iteration (ms): 13675.7 | learning rate: 1.104E-05 | global batch size: 16 | lm loss: 6.694830E+00 | loss scale: 32768.0 | grad norm: 141757.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2491/ 159576 | consumed samples: 39856 | elapsed time per iteration (ms): 14083.9 | learning rate: 1.105E-05 | global batch size: 16 | lm loss: 6.642892E+00 | loss scale: 32768.0 | grad norm: 119287.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2492/ 159576 | consumed samples: 39872 | elapsed time per iteration (ms): 13603.6 | learning rate: 1.105E-05 | global batch size: 16 | lm loss: 6.801910E+00 | loss scale: 32768.0 | grad norm: 155539.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2493/ 159576 | consumed samples: 39888 | elapsed time per iteration (ms): 13598.7 | learning rate: 1.105E-05 | global batch size: 16 | lm loss: 6.791874E+00 | loss scale: 32768.0 | grad norm: 122407.998 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2494/ 159576 | consumed samples: 39904 | elapsed time per iteration (ms): 13643.8 | learning rate: 1.106E-05 | global batch size: 16 | lm loss: 6.826643E+00 | loss scale: 32768.0 | grad norm: 128586.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2495/ 159576 | consumed samples: 39920 | elapsed time per iteration (ms): 13584.0 | learning rate: 1.106E-05 | global batch size: 16 | lm loss: 6.715306E+00 | loss scale: 32768.0 | grad norm: 99484.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2496/ 159576 | consumed samples: 39936 | elapsed time per iteration (ms): 13754.1 | learning rate: 1.107E-05 | global batch size: 16 | lm loss: 6.833625E+00 | loss scale: 32768.0 | grad norm: 115202.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2497/ 159576 | consumed samples: 39952 | elapsed time per iteration (ms): 13634.3 | learning rate: 1.107E-05 | global batch size: 16 | lm loss: 6.915625E+00 | loss scale: 32768.0 | grad norm: 186838.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2498/ 159576 | consumed samples: 39968 | elapsed time per iteration (ms): 13644.0 | learning rate: 1.108E-05 | global batch size: 16 | lm loss: 6.967087E+00 | loss scale: 32768.0 | grad norm: 131122.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2499/ 159576 | consumed samples: 39984 | elapsed time per iteration (ms): 13681.7 | learning rate: 1.108E-05 | global batch size: 16 | lm loss: 6.760918E+00 | loss scale: 32768.0 | grad norm: 194624.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2500/ 159576 | consumed samples: 40000 | elapsed time per iteration (ms): 14007.6 | learning rate: 1.109E-05 | global batch size: 16 | lm loss: 6.979738E+00 | loss scale: 32768.0 | grad norm: 156689.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2501/ 159576 | consumed samples: 40016 | elapsed time per iteration (ms): 13617.5 | learning rate: 1.109E-05 | global batch size: 16 | lm loss: 6.789479E+00 | loss scale: 32768.0 | grad norm: 144780.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2502/ 159576 | consumed samples: 40032 | elapsed time per iteration (ms): 13599.5 | learning rate: 1.109E-05 | global batch size: 16 | lm loss: 6.864005E+00 | loss scale: 32768.0 | grad norm: 170229.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2503/ 159576 | consumed samples: 40048 | elapsed time per iteration (ms): 13573.2 | learning rate: 1.110E-05 | global batch size: 16 | lm loss: 6.666573E+00 | loss scale: 32768.0 | grad norm: 146264.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2504/ 159576 | consumed samples: 40064 | elapsed time per iteration (ms): 13981.7 | learning rate: 1.110E-05 | global batch size: 16 | lm loss: 6.757555E+00 | loss scale: 32768.0 | grad norm: 194432.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2505/ 159576 | consumed samples: 40080 | elapsed time per iteration (ms): 13815.5 | learning rate: 1.111E-05 | global batch size: 16 | lm loss: 7.060199E+00 | loss scale: 32768.0 | grad norm: 107664.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2506/ 159576 | consumed samples: 40096 | elapsed time per iteration (ms): 13708.3 | learning rate: 1.111E-05 | global batch size: 16 | lm loss: 6.757818E+00 | loss scale: 32768.0 | grad norm: 172391.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2507/ 159576 | consumed samples: 40112 | elapsed time per iteration (ms): 13682.1 | learning rate: 1.112E-05 | global batch size: 16 | lm loss: 6.957751E+00 | loss scale: 32768.0 | grad norm: 153732.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2508/ 159576 | consumed samples: 40128 | elapsed time per iteration (ms): 13651.8 | learning rate: 1.112E-05 | global batch size: 16 | lm loss: 6.697278E+00 | loss scale: 32768.0 | grad norm: 269873.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2509/ 159576 | consumed samples: 40144 | elapsed time per iteration (ms): 13847.8 | learning rate: 1.113E-05 | global batch size: 16 | lm loss: 6.915687E+00 | loss scale: 32768.0 | grad norm: 203672.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2510/ 159576 | consumed samples: 40160 | elapsed time per iteration (ms): 13726.7 | learning rate: 1.113E-05 | global batch size: 16 | lm loss: 6.563999E+00 | loss scale: 32768.0 | grad norm: 156793.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2511/ 159576 | consumed samples: 40176 | elapsed time per iteration (ms): 13592.8 | learning rate: 1.113E-05 | global batch size: 16 | lm loss: 6.816392E+00 | loss scale: 32768.0 | grad norm: 174319.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2512/ 159576 | consumed samples: 40192 | elapsed time per iteration (ms): 13663.1 | learning rate: 1.114E-05 | global batch size: 16 | lm loss: 6.610006E+00 | loss scale: 32768.0 | grad norm: 205941.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2513/ 159576 | consumed samples: 40208 | elapsed time per iteration (ms): 13997.4 | learning rate: 1.114E-05 | global batch size: 16 | lm loss: 6.968318E+00 | loss scale: 32768.0 | grad norm: 198426.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2514/ 159576 | consumed samples: 40224 | elapsed time per iteration (ms): 13639.5 | learning rate: 1.115E-05 | global batch size: 16 | lm loss: 6.754237E+00 | loss scale: 32768.0 | grad norm: 150994.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2515/ 159576 | consumed samples: 40240 | elapsed time per iteration (ms): 13721.6 | learning rate: 1.115E-05 | global batch size: 16 | lm loss: 6.780080E+00 | loss scale: 32768.0 | grad norm: 221933.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2516/ 159576 | consumed samples: 40256 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.116E-05 | global batch size: 16 | lm loss: 7.005465E+00 | loss scale: 32768.0 | grad norm: 111981.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2517/ 159576 | consumed samples: 40272 | elapsed time per iteration (ms): 13636.9 | learning rate: 1.116E-05 | global batch size: 16 | lm loss: 7.038844E+00 | loss scale: 32768.0 | grad norm: 207331.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2518/ 159576 | consumed samples: 40288 | elapsed time per iteration (ms): 13872.4 | learning rate: 1.117E-05 | global batch size: 16 | lm loss: 6.753989E+00 | loss scale: 32768.0 | grad norm: 152725.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2519/ 159576 | consumed samples: 40304 | elapsed time per iteration (ms): 13607.9 | learning rate: 1.117E-05 | global batch size: 16 | lm loss: 6.981558E+00 | loss scale: 32768.0 | grad norm: 154949.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2520/ 159576 | consumed samples: 40320 | elapsed time per iteration (ms): 13684.9 | learning rate: 1.117E-05 | global batch size: 16 | lm loss: 6.906241E+00 | loss scale: 32768.0 | grad norm: 125549.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2521/ 159576 | consumed samples: 40336 | elapsed time per iteration (ms): 13716.2 | learning rate: 1.118E-05 | global batch size: 16 | lm loss: 6.747027E+00 | loss scale: 32768.0 | grad norm: 122780.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2522/ 159576 | consumed samples: 40352 | elapsed time per iteration (ms): 14167.1 | learning rate: 1.118E-05 | global batch size: 16 | lm loss: 6.970352E+00 | loss scale: 32768.0 | grad norm: 118819.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2523/ 159576 | consumed samples: 40368 | elapsed time per iteration (ms): 13664.4 | learning rate: 1.119E-05 | global batch size: 16 | lm loss: 6.714174E+00 | loss scale: 32768.0 | grad norm: 146027.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2524/ 159576 | consumed samples: 40384 | elapsed time per iteration (ms): 13630.7 | learning rate: 1.119E-05 | global batch size: 16 | lm loss: 6.610335E+00 | loss scale: 32768.0 | grad norm: 242081.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2525/ 159576 | consumed samples: 40400 | elapsed time per iteration (ms): 13685.5 | learning rate: 1.120E-05 | global batch size: 16 | lm loss: 6.889633E+00 | loss scale: 32768.0 | grad norm: 125371.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2526/ 159576 | consumed samples: 40416 | elapsed time per iteration (ms): 13989.6 | learning rate: 1.120E-05 | global batch size: 16 | lm loss: 6.703308E+00 | loss scale: 32768.0 | grad norm: 229244.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2527/ 159576 | consumed samples: 40432 | elapsed time per iteration (ms): 13653.7 | learning rate: 1.121E-05 | global batch size: 16 | lm loss: 6.903625E+00 | loss scale: 32768.0 | grad norm: 180615.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2528/ 159576 | consumed samples: 40448 | elapsed time per iteration (ms): 13688.8 | learning rate: 1.121E-05 | global batch size: 16 | lm loss: 6.882591E+00 | loss scale: 32768.0 | grad norm: 123446.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2529/ 159576 | consumed samples: 40464 | elapsed time per iteration (ms): 13727.9 | learning rate: 1.121E-05 | global batch size: 16 | lm loss: 6.771068E+00 | loss scale: 32768.0 | grad norm: 136122.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2530/ 159576 | consumed samples: 40480 | elapsed time per iteration (ms): 13727.3 | learning rate: 1.122E-05 | global batch size: 16 | lm loss: 6.839997E+00 | loss scale: 32768.0 | grad norm: 198759.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2531/ 159576 | consumed samples: 40496 | elapsed time per iteration (ms): 13882.2 | learning rate: 1.122E-05 | global batch size: 16 | lm loss: 6.934726E+00 | loss scale: 32768.0 | grad norm: 140393.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2532/ 159576 | consumed samples: 40512 | elapsed time per iteration (ms): 13707.7 | learning rate: 1.123E-05 | global batch size: 16 | lm loss: 6.824786E+00 | loss scale: 32768.0 | grad norm: 136497.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2533/ 159576 | consumed samples: 40528 | elapsed time per iteration (ms): 13668.7 | learning rate: 1.123E-05 | global batch size: 16 | lm loss: 6.638996E+00 | loss scale: 32768.0 | grad norm: 108086.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2534/ 159576 | consumed samples: 40544 | elapsed time per iteration (ms): 13600.7 | learning rate: 1.124E-05 | global batch size: 16 | lm loss: 6.684957E+00 | loss scale: 32768.0 | grad norm: 136205.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2535/ 159576 | consumed samples: 40560 | elapsed time per iteration (ms): 14008.2 | learning rate: 1.124E-05 | global batch size: 16 | lm loss: 6.650595E+00 | loss scale: 32768.0 | grad norm: 89458.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2536/ 159576 | consumed samples: 40576 | elapsed time per iteration (ms): 13696.2 | learning rate: 1.125E-05 | global batch size: 16 | lm loss: 6.720654E+00 | loss scale: 32768.0 | grad norm: 207949.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2537/ 159576 | consumed samples: 40592 | elapsed time per iteration (ms): 13728.0 | learning rate: 1.125E-05 | global batch size: 16 | lm loss: 6.934484E+00 | loss scale: 32768.0 | grad norm: 145165.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2538/ 159576 | consumed samples: 40608 | elapsed time per iteration (ms): 13707.3 | learning rate: 1.125E-05 | global batch size: 16 | lm loss: 6.659933E+00 | loss scale: 32768.0 | grad norm: 109227.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2539/ 159576 | consumed samples: 40624 | elapsed time per iteration (ms): 14115.0 | learning rate: 1.126E-05 | global batch size: 16 | lm loss: 6.638377E+00 | loss scale: 32768.0 | grad norm: 221623.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2540/ 159576 | consumed samples: 40640 | elapsed time per iteration (ms): 13557.7 | learning rate: 1.126E-05 | global batch size: 16 | lm loss: 6.825821E+00 | loss scale: 32768.0 | grad norm: 114656.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2541/ 159576 | consumed samples: 40656 | elapsed time per iteration (ms): 13635.6 | learning rate: 1.127E-05 | global batch size: 16 | lm loss: 6.869952E+00 | loss scale: 32768.0 | grad norm: 204975.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2542/ 159576 | consumed samples: 40672 | elapsed time per iteration (ms): 13682.2 | learning rate: 1.127E-05 | global batch size: 16 | lm loss: 6.829473E+00 | loss scale: 32768.0 | grad norm: 158875.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2543/ 159576 | consumed samples: 40688 | elapsed time per iteration (ms): 13675.9 | learning rate: 1.128E-05 | global batch size: 16 | lm loss: 6.921135E+00 | loss scale: 32768.0 | grad norm: 248424.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2544/ 159576 | consumed samples: 40704 | elapsed time per iteration (ms): 14035.2 | learning rate: 1.128E-05 | global batch size: 16 | lm loss: 6.734321E+00 | loss scale: 32768.0 | grad norm: 137358.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2545/ 159576 | consumed samples: 40720 | elapsed time per iteration (ms): 13685.4 | learning rate: 1.129E-05 | global batch size: 16 | lm loss: 6.824071E+00 | loss scale: 32768.0 | grad norm: 172473.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2546/ 159576 | consumed samples: 40736 | elapsed time per iteration (ms): 13704.2 | learning rate: 1.129E-05 | global batch size: 16 | lm loss: 6.741428E+00 | loss scale: 32768.0 | grad norm: 117821.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2547/ 159576 | consumed samples: 40752 | elapsed time per iteration (ms): 13625.1 | learning rate: 1.129E-05 | global batch size: 16 | lm loss: 6.825446E+00 | loss scale: 32768.0 | grad norm: 302813.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2548/ 159576 | consumed samples: 40768 | elapsed time per iteration (ms): 13978.9 | learning rate: 1.130E-05 | global batch size: 16 | lm loss: 6.930991E+00 | loss scale: 32768.0 | grad norm: 163222.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2549/ 159576 | consumed samples: 40784 | elapsed time per iteration (ms): 13605.2 | learning rate: 1.130E-05 | global batch size: 16 | lm loss: 6.901045E+00 | loss scale: 32768.0 | grad norm: 178776.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2550/ 159576 | consumed samples: 40800 | elapsed time per iteration (ms): 13704.5 | learning rate: 1.131E-05 | global batch size: 16 | lm loss: 6.923467E+00 | loss scale: 32768.0 | grad norm: 156500.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2551/ 159576 | consumed samples: 40816 | elapsed time per iteration (ms): 13642.0 | learning rate: 1.131E-05 | global batch size: 16 | lm loss: 6.698053E+00 | loss scale: 32768.0 | grad norm: 142885.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2552/ 159576 | consumed samples: 40832 | elapsed time per iteration (ms): 13988.3 | learning rate: 1.132E-05 | global batch size: 16 | lm loss: 6.774540E+00 | loss scale: 32768.0 | grad norm: 236886.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2553/ 159576 | consumed samples: 40848 | elapsed time per iteration (ms): 13862.8 | learning rate: 1.132E-05 | global batch size: 16 | lm loss: 6.706432E+00 | loss scale: 32768.0 | grad norm: 178546.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2554/ 159576 | consumed samples: 40864 | elapsed time per iteration (ms): 13629.3 | learning rate: 1.133E-05 | global batch size: 16 | lm loss: 6.631795E+00 | loss scale: 32768.0 | grad norm: 176739.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2555/ 159576 | consumed samples: 40880 | elapsed time per iteration (ms): 13608.3 | learning rate: 1.133E-05 | global batch size: 16 | lm loss: 7.180985E+00 | loss scale: 32768.0 | grad norm: 132584.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2556/ 159576 | consumed samples: 40896 | elapsed time per iteration (ms): 13580.0 | learning rate: 1.133E-05 | global batch size: 16 | lm loss: 6.838911E+00 | loss scale: 32768.0 | grad norm: 90158.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2557/ 159576 | consumed samples: 40912 | elapsed time per iteration (ms): 13942.7 | learning rate: 1.134E-05 | global batch size: 16 | lm loss: 6.693833E+00 | loss scale: 32768.0 | grad norm: 220674.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2558/ 159576 | consumed samples: 40928 | elapsed time per iteration (ms): 13802.7 | learning rate: 1.134E-05 | global batch size: 16 | lm loss: 6.568502E+00 | loss scale: 32768.0 | grad norm: 98298.873 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2559/ 159576 | consumed samples: 40944 | elapsed time per iteration (ms): 13641.3 | learning rate: 1.135E-05 | global batch size: 16 | lm loss: 6.635581E+00 | loss scale: 32768.0 | grad norm: 169974.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2560/ 159576 | consumed samples: 40960 | elapsed time per iteration (ms): 13704.3 | learning rate: 1.135E-05 | global batch size: 16 | lm loss: 6.565581E+00 | loss scale: 32768.0 | grad norm: 129387.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2561/ 159576 | consumed samples: 40976 | elapsed time per iteration (ms): 14001.7 | learning rate: 1.136E-05 | global batch size: 16 | lm loss: 6.892058E+00 | loss scale: 32768.0 | grad norm: 339367.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2562/ 159576 | consumed samples: 40992 | elapsed time per iteration (ms): 13513.6 | learning rate: 1.136E-05 | global batch size: 16 | lm loss: 6.762362E+00 | loss scale: 32768.0 | grad norm: 232794.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2563/ 159576 | consumed samples: 41008 | elapsed time per iteration (ms): 13601.0 | learning rate: 1.137E-05 | global batch size: 16 | lm loss: 6.843441E+00 | loss scale: 32768.0 | grad norm: 163664.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2564/ 159576 | consumed samples: 41024 | elapsed time per iteration (ms): 13594.8 | learning rate: 1.137E-05 | global batch size: 16 | lm loss: 6.819015E+00 | loss scale: 32768.0 | grad norm: 216339.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2565/ 159576 | consumed samples: 41040 | elapsed time per iteration (ms): 13605.6 | learning rate: 1.137E-05 | global batch size: 16 | lm loss: 6.897832E+00 | loss scale: 32768.0 | grad norm: 109607.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2566/ 159576 | consumed samples: 41056 | elapsed time per iteration (ms): 13861.7 | learning rate: 1.138E-05 | global batch size: 16 | lm loss: 6.875384E+00 | loss scale: 32768.0 | grad norm: 402667.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2567/ 159576 | consumed samples: 41072 | elapsed time per iteration (ms): 13673.7 | learning rate: 1.138E-05 | global batch size: 16 | lm loss: 6.761830E+00 | loss scale: 32768.0 | grad norm: 270664.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2568/ 159576 | consumed samples: 41088 | elapsed time per iteration (ms): 13672.1 | learning rate: 1.139E-05 | global batch size: 16 | lm loss: 6.792128E+00 | loss scale: 32768.0 | grad norm: 176693.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2569/ 159576 | consumed samples: 41104 | elapsed time per iteration (ms): 13609.9 | learning rate: 1.139E-05 | global batch size: 16 | lm loss: 6.935215E+00 | loss scale: 32768.0 | grad norm: 182679.180 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 12:06:15] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 12:06:15] PULSE: tr8-104B is running for 6:14:04 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 2570/ 159576 | consumed samples: 41120 | elapsed time per iteration (ms): 14149.7 | learning rate: 1.140E-05 | global batch size: 16 | lm loss: 6.826759E+00 | loss scale: 32768.0 | grad norm: 135711.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2571/ 159576 | consumed samples: 41136 | elapsed time per iteration (ms): 13749.2 | learning rate: 1.140E-05 | global batch size: 16 | lm loss: 6.600703E+00 | loss scale: 32768.0 | grad norm: 143461.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2572/ 159576 | consumed samples: 41152 | elapsed time per iteration (ms): 13601.5 | learning rate: 1.141E-05 | global batch size: 16 | lm loss: 6.747102E+00 | loss scale: 32768.0 | grad norm: 205480.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2573/ 159576 | consumed samples: 41168 | elapsed time per iteration (ms): 13680.7 | learning rate: 1.141E-05 | global batch size: 16 | lm loss: 6.767237E+00 | loss scale: 32768.0 | grad norm: 186807.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2574/ 159576 | consumed samples: 41184 | elapsed time per iteration (ms): 14103.7 | learning rate: 1.141E-05 | global batch size: 16 | lm loss: 6.786840E+00 | loss scale: 32768.0 | grad norm: 125986.096 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2575/ 159576 | consumed samples: 41200 | elapsed time per iteration (ms): 13634.6 | learning rate: 1.142E-05 | global batch size: 16 | lm loss: 6.740016E+00 | loss scale: 32768.0 | grad norm: 127578.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2576/ 159576 | consumed samples: 41216 | elapsed time per iteration (ms): 13632.4 | learning rate: 1.142E-05 | global batch size: 16 | lm loss: 6.717787E+00 | loss scale: 32768.0 | grad norm: 91352.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2577/ 159576 | consumed samples: 41232 | elapsed time per iteration (ms): 13613.7 | learning rate: 1.143E-05 | global batch size: 16 | lm loss: 6.736307E+00 | loss scale: 32768.0 | grad norm: 161126.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2578/ 159576 | consumed samples: 41248 | elapsed time per iteration (ms): 13501.7 | learning rate: 1.143E-05 | global batch size: 16 | lm loss: 6.725785E+00 | loss scale: 32768.0 | grad norm: 105065.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2579/ 159576 | consumed samples: 41264 | elapsed time per iteration (ms): 13746.0 | learning rate: 1.144E-05 | global batch size: 16 | lm loss: 6.731723E+00 | loss scale: 32768.0 | grad norm: 123413.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2580/ 159576 | consumed samples: 41280 | elapsed time per iteration (ms): 13621.8 | learning rate: 1.144E-05 | global batch size: 16 | lm loss: 6.889888E+00 | loss scale: 32768.0 | grad norm: 128934.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2581/ 159576 | consumed samples: 41296 | elapsed time per iteration (ms): 13634.3 | learning rate: 1.145E-05 | global batch size: 16 | lm loss: 6.845993E+00 | loss scale: 32768.0 | grad norm: 140353.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2582/ 159576 | consumed samples: 41312 | elapsed time per iteration (ms): 13645.1 | learning rate: 1.145E-05 | global batch size: 16 | lm loss: 6.922751E+00 | loss scale: 32768.0 | grad norm: 193649.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2583/ 159576 | consumed samples: 41328 | elapsed time per iteration (ms): 14012.6 | learning rate: 1.145E-05 | global batch size: 16 | lm loss: 6.706060E+00 | loss scale: 32768.0 | grad norm: 120536.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2584/ 159576 | consumed samples: 41344 | elapsed time per iteration (ms): 13567.7 | learning rate: 1.146E-05 | global batch size: 16 | lm loss: 6.729124E+00 | loss scale: 32768.0 | grad norm: 150036.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2585/ 159576 | consumed samples: 41360 | elapsed time per iteration (ms): 13534.2 | learning rate: 1.146E-05 | global batch size: 16 | lm loss: 6.841982E+00 | loss scale: 32768.0 | grad norm: 169788.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2586/ 159576 | consumed samples: 41376 | elapsed time per iteration (ms): 13556.0 | learning rate: 1.147E-05 | global batch size: 16 | lm loss: 6.813578E+00 | loss scale: 32768.0 | grad norm: 120615.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2587/ 159576 | consumed samples: 41392 | elapsed time per iteration (ms): 13668.2 | learning rate: 1.147E-05 | global batch size: 16 | lm loss: 6.675393E+00 | loss scale: 32768.0 | grad norm: 202372.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2588/ 159576 | consumed samples: 41408 | elapsed time per iteration (ms): 13867.2 | learning rate: 1.148E-05 | global batch size: 16 | lm loss: 6.796386E+00 | loss scale: 32768.0 | grad norm: 131901.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2589/ 159576 | consumed samples: 41424 | elapsed time per iteration (ms): 13636.7 | learning rate: 1.148E-05 | global batch size: 16 | lm loss: 6.783171E+00 | loss scale: 32768.0 | grad norm: 127655.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2590/ 159576 | consumed samples: 41440 | elapsed time per iteration (ms): 13677.9 | learning rate: 1.149E-05 | global batch size: 16 | lm loss: 6.672108E+00 | loss scale: 32768.0 | grad norm: 111803.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2591/ 159576 | consumed samples: 41456 | elapsed time per iteration (ms): 13670.0 | learning rate: 1.149E-05 | global batch size: 16 | lm loss: 6.894643E+00 | loss scale: 32768.0 | grad norm: 156503.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2592/ 159576 | consumed samples: 41472 | elapsed time per iteration (ms): 14137.5 | learning rate: 1.149E-05 | global batch size: 16 | lm loss: 6.765024E+00 | loss scale: 32768.0 | grad norm: 160594.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2593/ 159576 | consumed samples: 41488 | elapsed time per iteration (ms): 13635.7 | learning rate: 1.150E-05 | global batch size: 16 | lm loss: 6.882227E+00 | loss scale: 32768.0 | grad norm: 142008.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2594/ 159576 | consumed samples: 41504 | elapsed time per iteration (ms): 13592.8 | learning rate: 1.150E-05 | global batch size: 16 | lm loss: 6.750668E+00 | loss scale: 32768.0 | grad norm: 137376.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2595/ 159576 | consumed samples: 41520 | elapsed time per iteration (ms): 13572.7 | learning rate: 1.151E-05 | global batch size: 16 | lm loss: 6.870511E+00 | loss scale: 32768.0 | grad norm: 203139.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2596/ 159576 | consumed samples: 41536 | elapsed time per iteration (ms): 13955.3 | learning rate: 1.151E-05 | global batch size: 16 | lm loss: 6.952578E+00 | loss scale: 32768.0 | grad norm: 259660.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2597/ 159576 | consumed samples: 41552 | elapsed time per iteration (ms): 13711.6 | learning rate: 1.152E-05 | global batch size: 16 | lm loss: 6.681178E+00 | loss scale: 32768.0 | grad norm: 126907.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2598/ 159576 | consumed samples: 41568 | elapsed time per iteration (ms): 13707.8 | learning rate: 1.152E-05 | global batch size: 16 | lm loss: 6.610268E+00 | loss scale: 32768.0 | grad norm: 135897.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2599/ 159576 | consumed samples: 41584 | elapsed time per iteration (ms): 13564.4 | learning rate: 1.153E-05 | global batch size: 16 | lm loss: 6.826151E+00 | loss scale: 32768.0 | grad norm: 155911.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2600/ 159576 | consumed samples: 41600 | elapsed time per iteration (ms): 13546.1 | learning rate: 1.153E-05 | global batch size: 16 | lm loss: 6.632576E+00 | loss scale: 32768.0 | grad norm: 252409.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2601/ 159576 | consumed samples: 41616 | elapsed time per iteration (ms): 13887.8 | learning rate: 1.153E-05 | global batch size: 16 | lm loss: 6.631788E+00 | loss scale: 32768.0 | grad norm: 165940.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2602/ 159576 | consumed samples: 41632 | elapsed time per iteration (ms): 13567.8 | learning rate: 1.154E-05 | global batch size: 16 | lm loss: 6.939396E+00 | loss scale: 32768.0 | grad norm: 124805.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2603/ 159576 | consumed samples: 41648 | elapsed time per iteration (ms): 13581.4 | learning rate: 1.154E-05 | global batch size: 16 | lm loss: 6.924129E+00 | loss scale: 32768.0 | grad norm: 133938.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2604/ 159576 | consumed samples: 41664 | elapsed time per iteration (ms): 13613.2 | learning rate: 1.155E-05 | global batch size: 16 | lm loss: 6.660190E+00 | loss scale: 32768.0 | grad norm: 188689.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2605/ 159576 | consumed samples: 41680 | elapsed time per iteration (ms): 14144.8 | learning rate: 1.155E-05 | global batch size: 16 | lm loss: 6.643148E+00 | loss scale: 32768.0 | grad norm: 123140.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2606/ 159576 | consumed samples: 41696 | elapsed time per iteration (ms): 13667.3 | learning rate: 1.156E-05 | global batch size: 16 | lm loss: 6.805959E+00 | loss scale: 32768.0 | grad norm: 196566.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2607/ 159576 | consumed samples: 41712 | elapsed time per iteration (ms): 13574.2 | learning rate: 1.156E-05 | global batch size: 16 | lm loss: 6.711599E+00 | loss scale: 32768.0 | grad norm: 167578.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2608/ 159576 | consumed samples: 41728 | elapsed time per iteration (ms): 13571.4 | learning rate: 1.157E-05 | global batch size: 16 | lm loss: 6.852364E+00 | loss scale: 32768.0 | grad norm: 120545.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2609/ 159576 | consumed samples: 41744 | elapsed time per iteration (ms): 13823.4 | learning rate: 1.157E-05 | global batch size: 16 | lm loss: 6.988579E+00 | loss scale: 32768.0 | grad norm: 242130.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2610/ 159576 | consumed samples: 41760 | elapsed time per iteration (ms): 13677.8 | learning rate: 1.157E-05 | global batch size: 16 | lm loss: 6.640975E+00 | loss scale: 32768.0 | grad norm: 193270.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2611/ 159576 | consumed samples: 41776 | elapsed time per iteration (ms): 13648.9 | learning rate: 1.158E-05 | global batch size: 16 | lm loss: 6.554218E+00 | loss scale: 32768.0 | grad norm: 132307.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2612/ 159576 | consumed samples: 41792 | elapsed time per iteration (ms): 13675.5 | learning rate: 1.158E-05 | global batch size: 16 | lm loss: 6.875402E+00 | loss scale: 32768.0 | grad norm: 127017.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2613/ 159576 | consumed samples: 41808 | elapsed time per iteration (ms): 13589.6 | learning rate: 1.159E-05 | global batch size: 16 | lm loss: 6.853450E+00 | loss scale: 32768.0 | grad norm: 271835.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2614/ 159576 | consumed samples: 41824 | elapsed time per iteration (ms): 13981.2 | learning rate: 1.159E-05 | global batch size: 16 | lm loss: 6.810247E+00 | loss scale: 32768.0 | grad norm: 210644.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2615/ 159576 | consumed samples: 41840 | elapsed time per iteration (ms): 13580.3 | learning rate: 1.160E-05 | global batch size: 16 | lm loss: 6.856892E+00 | loss scale: 32768.0 | grad norm: 139996.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2616/ 159576 | consumed samples: 41856 | elapsed time per iteration (ms): 13592.7 | learning rate: 1.160E-05 | global batch size: 16 | lm loss: 6.687234E+00 | loss scale: 32768.0 | grad norm: 130216.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2617/ 159576 | consumed samples: 41872 | elapsed time per iteration (ms): 13579.5 | learning rate: 1.161E-05 | global batch size: 16 | lm loss: 6.753475E+00 | loss scale: 32768.0 | grad norm: 270435.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2618/ 159576 | consumed samples: 41888 | elapsed time per iteration (ms): 14037.5 | learning rate: 1.161E-05 | global batch size: 16 | lm loss: 6.964073E+00 | loss scale: 32768.0 | grad norm: 185416.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2619/ 159576 | consumed samples: 41904 | elapsed time per iteration (ms): 13552.1 | learning rate: 1.161E-05 | global batch size: 16 | lm loss: 6.609634E+00 | loss scale: 32768.0 | grad norm: 157098.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2620/ 159576 | consumed samples: 41920 | elapsed time per iteration (ms): 13574.2 | learning rate: 1.162E-05 | global batch size: 16 | lm loss: 7.006974E+00 | loss scale: 32768.0 | grad norm: 140378.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2621/ 159576 | consumed samples: 41936 | elapsed time per iteration (ms): 13648.0 | learning rate: 1.162E-05 | global batch size: 16 | lm loss: 6.562167E+00 | loss scale: 32768.0 | grad norm: 169654.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2622/ 159576 | consumed samples: 41952 | elapsed time per iteration (ms): 13713.4 | learning rate: 1.163E-05 | global batch size: 16 | lm loss: 6.810758E+00 | loss scale: 32768.0 | grad norm: 209798.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2623/ 159576 | consumed samples: 41968 | elapsed time per iteration (ms): 13925.7 | learning rate: 1.163E-05 | global batch size: 16 | lm loss: 6.522465E+00 | loss scale: 32768.0 | grad norm: 119471.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2624/ 159576 | consumed samples: 41984 | elapsed time per iteration (ms): 13583.0 | learning rate: 1.164E-05 | global batch size: 16 | lm loss: 6.827784E+00 | loss scale: 32768.0 | grad norm: 115498.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2625/ 159576 | consumed samples: 42000 | elapsed time per iteration (ms): 13618.7 | learning rate: 1.164E-05 | global batch size: 16 | lm loss: 6.663583E+00 | loss scale: 32768.0 | grad norm: 131333.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2626/ 159576 | consumed samples: 42016 | elapsed time per iteration (ms): 13695.0 | learning rate: 1.164E-05 | global batch size: 16 | lm loss: 6.731676E+00 | loss scale: 32768.0 | grad norm: 105476.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2627/ 159576 | consumed samples: 42032 | elapsed time per iteration (ms): 14032.3 | learning rate: 1.165E-05 | global batch size: 16 | lm loss: 6.635394E+00 | loss scale: 32768.0 | grad norm: 155841.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2628/ 159576 | consumed samples: 42048 | elapsed time per iteration (ms): 13596.4 | learning rate: 1.165E-05 | global batch size: 16 | lm loss: 6.768427E+00 | loss scale: 32768.0 | grad norm: 91352.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2629/ 159576 | consumed samples: 42064 | elapsed time per iteration (ms): 13735.4 | learning rate: 1.166E-05 | global batch size: 16 | lm loss: 6.877464E+00 | loss scale: 32768.0 | grad norm: 246645.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2630/ 159576 | consumed samples: 42080 | elapsed time per iteration (ms): 13558.6 | learning rate: 1.166E-05 | global batch size: 16 | lm loss: 6.714092E+00 | loss scale: 32768.0 | grad norm: 131077.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2631/ 159576 | consumed samples: 42096 | elapsed time per iteration (ms): 14063.2 | learning rate: 1.167E-05 | global batch size: 16 | lm loss: 6.598214E+00 | loss scale: 32768.0 | grad norm: 142113.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2632/ 159576 | consumed samples: 42112 | elapsed time per iteration (ms): 13570.0 | learning rate: 1.167E-05 | global batch size: 16 | lm loss: 6.958339E+00 | loss scale: 32768.0 | grad norm: 196255.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2633/ 159576 | consumed samples: 42128 | elapsed time per iteration (ms): 13592.6 | learning rate: 1.168E-05 | global batch size: 16 | lm loss: 6.596231E+00 | loss scale: 32768.0 | grad norm: 167680.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2634/ 159576 | consumed samples: 42144 | elapsed time per iteration (ms): 13671.7 | learning rate: 1.168E-05 | global batch size: 16 | lm loss: 6.775526E+00 | loss scale: 32768.0 | grad norm: 111055.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2635/ 159576 | consumed samples: 42160 | elapsed time per iteration (ms): 13642.2 | learning rate: 1.168E-05 | global batch size: 16 | lm loss: 6.786438E+00 | loss scale: 32768.0 | grad norm: 146172.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2636/ 159576 | consumed samples: 42176 | elapsed time per iteration (ms): 14001.7 | learning rate: 1.169E-05 | global batch size: 16 | lm loss: 6.785826E+00 | loss scale: 32768.0 | grad norm: 101705.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2637/ 159576 | consumed samples: 42192 | elapsed time per iteration (ms): 13632.3 | learning rate: 1.169E-05 | global batch size: 16 | lm loss: 6.918137E+00 | loss scale: 32768.0 | grad norm: 359289.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2638/ 159576 | consumed samples: 42208 | elapsed time per iteration (ms): 13642.4 | learning rate: 1.170E-05 | global batch size: 16 | lm loss: 6.474925E+00 | loss scale: 32768.0 | grad norm: 210644.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2639/ 159576 | consumed samples: 42224 | elapsed time per iteration (ms): 13584.1 | learning rate: 1.170E-05 | global batch size: 16 | lm loss: 6.622705E+00 | loss scale: 32768.0 | grad norm: 159853.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2640/ 159576 | consumed samples: 42240 | elapsed time per iteration (ms): 13928.4 | learning rate: 1.171E-05 | global batch size: 16 | lm loss: 6.883276E+00 | loss scale: 32768.0 | grad norm: 134874.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2641/ 159576 | consumed samples: 42256 | elapsed time per iteration (ms): 13672.3 | learning rate: 1.171E-05 | global batch size: 16 | lm loss: 6.975843E+00 | loss scale: 32768.0 | grad norm: 136138.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2642/ 159576 | consumed samples: 42272 | elapsed time per iteration (ms): 13705.7 | learning rate: 1.172E-05 | global batch size: 16 | lm loss: 6.698567E+00 | loss scale: 32768.0 | grad norm: 132708.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2643/ 159576 | consumed samples: 42288 | elapsed time per iteration (ms): 13640.4 | learning rate: 1.172E-05 | global batch size: 16 | lm loss: 6.910300E+00 | loss scale: 32768.0 | grad norm: 128937.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2644/ 159576 | consumed samples: 42304 | elapsed time per iteration (ms): 13924.6 | learning rate: 1.172E-05 | global batch size: 16 | lm loss: 6.661136E+00 | loss scale: 32768.0 | grad norm: 144385.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2645/ 159576 | consumed samples: 42320 | elapsed time per iteration (ms): 13731.5 | learning rate: 1.173E-05 | global batch size: 16 | lm loss: 6.749330E+00 | loss scale: 32768.0 | grad norm: 136497.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2646/ 159576 | consumed samples: 42336 | elapsed time per iteration (ms): 13631.6 | learning rate: 1.173E-05 | global batch size: 16 | lm loss: 6.774727E+00 | loss scale: 32768.0 | grad norm: 157115.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2647/ 159576 | consumed samples: 42352 | elapsed time per iteration (ms): 13587.3 | learning rate: 1.174E-05 | global batch size: 16 | lm loss: 6.897247E+00 | loss scale: 32768.0 | grad norm: 122884.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2648/ 159576 | consumed samples: 42368 | elapsed time per iteration (ms): 13582.9 | learning rate: 1.174E-05 | global batch size: 16 | lm loss: 6.902627E+00 | loss scale: 32768.0 | grad norm: 136617.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2649/ 159576 | consumed samples: 42384 | elapsed time per iteration (ms): 14194.1 | learning rate: 1.175E-05 | global batch size: 16 | lm loss: 6.654990E+00 | loss scale: 32768.0 | grad norm: 121668.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2650/ 159576 | consumed samples: 42400 | elapsed time per iteration (ms): 13827.0 | learning rate: 1.175E-05 | global batch size: 16 | lm loss: 6.718140E+00 | loss scale: 32768.0 | grad norm: 94592.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2651/ 159576 | consumed samples: 42416 | elapsed time per iteration (ms): 13600.7 | learning rate: 1.176E-05 | global batch size: 16 | lm loss: 6.674122E+00 | loss scale: 32768.0 | grad norm: 105220.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2652/ 159576 | consumed samples: 42432 | elapsed time per iteration (ms): 13643.1 | learning rate: 1.176E-05 | global batch size: 16 | lm loss: 6.662145E+00 | loss scale: 32768.0 | grad norm: 222158.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2653/ 159576 | consumed samples: 42448 | elapsed time per iteration (ms): 13957.5 | learning rate: 1.176E-05 | global batch size: 16 | lm loss: 6.613699E+00 | loss scale: 32768.0 | grad norm: 110830.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2654/ 159576 | consumed samples: 42464 | elapsed time per iteration (ms): 13668.1 | learning rate: 1.177E-05 | global batch size: 16 | lm loss: 6.510882E+00 | loss scale: 32768.0 | grad norm: 143615.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2655/ 159576 | consumed samples: 42480 | elapsed time per iteration (ms): 13633.2 | learning rate: 1.177E-05 | global batch size: 16 | lm loss: 6.732093E+00 | loss scale: 32768.0 | grad norm: 159462.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2656/ 159576 | consumed samples: 42496 | elapsed time per iteration (ms): 13620.1 | learning rate: 1.178E-05 | global batch size: 16 | lm loss: 6.660037E+00 | loss scale: 32768.0 | grad norm: 244166.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2657/ 159576 | consumed samples: 42512 | elapsed time per iteration (ms): 13831.3 | learning rate: 1.178E-05 | global batch size: 16 | lm loss: 6.626472E+00 | loss scale: 32768.0 | grad norm: 149275.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2658/ 159576 | consumed samples: 42528 | elapsed time per iteration (ms): 13824.8 | learning rate: 1.179E-05 | global batch size: 16 | lm loss: 6.687421E+00 | loss scale: 32768.0 | grad norm: 139977.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2659/ 159576 | consumed samples: 42544 | elapsed time per iteration (ms): 13722.5 | learning rate: 1.179E-05 | global batch size: 16 | lm loss: 6.524724E+00 | loss scale: 32768.0 | grad norm: 106042.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2660/ 159576 | consumed samples: 42560 | elapsed time per iteration (ms): 13670.7 | learning rate: 1.180E-05 | global batch size: 16 | lm loss: 6.908322E+00 | loss scale: 32768.0 | grad norm: 201686.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2661/ 159576 | consumed samples: 42576 | elapsed time per iteration (ms): 13612.7 | learning rate: 1.180E-05 | global batch size: 16 | lm loss: 6.837928E+00 | loss scale: 32768.0 | grad norm: 126017.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2662/ 159576 | consumed samples: 42592 | elapsed time per iteration (ms): 13941.2 | learning rate: 1.180E-05 | global batch size: 16 | lm loss: 6.439098E+00 | loss scale: 32768.0 | grad norm: 160984.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2663/ 159576 | consumed samples: 42608 | elapsed time per iteration (ms): 13713.4 | learning rate: 1.181E-05 | global batch size: 16 | lm loss: 6.723923E+00 | loss scale: 32768.0 | grad norm: 139598.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2664/ 159576 | consumed samples: 42624 | elapsed time per iteration (ms): 6797.7 | learning rate: 1.181E-05 | global batch size: 16 | lm loss: 7.335284E+00 | loss scale: 32768.0 | grad norm: 139598.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2665/ 159576 | consumed samples: 42640 | elapsed time per iteration (ms): 13135.0 | learning rate: 1.181E-05 | global batch size: 16 | lm loss: 6.985713E+00 | loss scale: 32768.0 | grad norm: 180390.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2666/ 159576 | consumed samples: 42656 | elapsed time per iteration (ms): 13618.0 | learning rate: 1.182E-05 | global batch size: 16 | lm loss: 6.556298E+00 | loss scale: 32768.0 | grad norm: 144470.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2667/ 159576 | consumed samples: 42672 | elapsed time per iteration (ms): 14126.5 | learning rate: 1.182E-05 | global batch size: 16 | lm loss: 7.063251E+00 | loss scale: 32768.0 | grad norm: 146115.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2668/ 159576 | consumed samples: 42688 | elapsed time per iteration (ms): 13677.8 | learning rate: 1.183E-05 | global batch size: 16 | lm loss: 6.846446E+00 | loss scale: 32768.0 | grad norm: 164938.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2669/ 159576 | consumed samples: 42704 | elapsed time per iteration (ms): 13662.5 | learning rate: 1.183E-05 | global batch size: 16 | lm loss: 6.704443E+00 | loss scale: 32768.0 | grad norm: 183338.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2670/ 159576 | consumed samples: 42720 | elapsed time per iteration (ms): 13752.8 | learning rate: 1.184E-05 | global batch size: 16 | lm loss: 6.828314E+00 | loss scale: 32768.0 | grad norm: 291659.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2671/ 159576 | consumed samples: 42736 | elapsed time per iteration (ms): 14053.5 | learning rate: 1.184E-05 | global batch size: 16 | lm loss: 6.701608E+00 | loss scale: 32768.0 | grad norm: 137566.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2672/ 159576 | consumed samples: 42752 | elapsed time per iteration (ms): 13555.7 | learning rate: 1.184E-05 | global batch size: 16 | lm loss: 6.495778E+00 | loss scale: 32768.0 | grad norm: 140566.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2673/ 159576 | consumed samples: 42768 | elapsed time per iteration (ms): 13625.0 | learning rate: 1.185E-05 | global batch size: 16 | lm loss: 6.868438E+00 | loss scale: 32768.0 | grad norm: 137822.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2674/ 159576 | consumed samples: 42784 | elapsed time per iteration (ms): 13681.3 | learning rate: 1.185E-05 | global batch size: 16 | lm loss: 6.855990E+00 | loss scale: 32768.0 | grad norm: 217925.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2675/ 159576 | consumed samples: 42800 | elapsed time per iteration (ms): 13726.3 | learning rate: 1.186E-05 | global batch size: 16 | lm loss: 6.726338E+00 | loss scale: 32768.0 | grad norm: 169676.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2676/ 159576 | consumed samples: 42816 | elapsed time per iteration (ms): 14028.2 | learning rate: 1.186E-05 | global batch size: 16 | lm loss: 6.632861E+00 | loss scale: 32768.0 | grad norm: 146027.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2677/ 159576 | consumed samples: 42832 | elapsed time per iteration (ms): 13624.3 | learning rate: 1.187E-05 | global batch size: 16 | lm loss: 6.642831E+00 | loss scale: 32768.0 | grad norm: 163148.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2678/ 159576 | consumed samples: 42848 | elapsed time per iteration (ms): 13717.5 | learning rate: 1.187E-05 | global batch size: 16 | lm loss: 6.689285E+00 | loss scale: 32768.0 | grad norm: 129142.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2679/ 159576 | consumed samples: 42864 | elapsed time per iteration (ms): 13575.7 | learning rate: 1.188E-05 | global batch size: 16 | lm loss: 6.577474E+00 | loss scale: 32768.0 | grad norm: 168075.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2680/ 159576 | consumed samples: 42880 | elapsed time per iteration (ms): 13990.7 | learning rate: 1.188E-05 | global batch size: 16 | lm loss: 6.806996E+00 | loss scale: 32768.0 | grad norm: 138707.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2681/ 159576 | consumed samples: 42896 | elapsed time per iteration (ms): 13614.3 | learning rate: 1.188E-05 | global batch size: 16 | lm loss: 6.616170E+00 | loss scale: 32768.0 | grad norm: 138396.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2682/ 159576 | consumed samples: 42912 | elapsed time per iteration (ms): 13528.4 | learning rate: 1.189E-05 | global batch size: 16 | lm loss: 6.760321E+00 | loss scale: 32768.0 | grad norm: 146622.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2683/ 159576 | consumed samples: 42928 | elapsed time per iteration (ms): 13595.4 | learning rate: 1.189E-05 | global batch size: 16 | lm loss: 6.828167E+00 | loss scale: 32768.0 | grad norm: 205452.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2684/ 159576 | consumed samples: 42944 | elapsed time per iteration (ms): 14090.0 | learning rate: 1.190E-05 | global batch size: 16 | lm loss: 6.974781E+00 | loss scale: 32768.0 | grad norm: 141438.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2685/ 159576 | consumed samples: 42960 | elapsed time per iteration (ms): 13490.5 | learning rate: 1.190E-05 | global batch size: 16 | lm loss: 6.720265E+00 | loss scale: 32768.0 | grad norm: 131667.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2686/ 159576 | consumed samples: 42976 | elapsed time per iteration (ms): 13606.4 | learning rate: 1.191E-05 | global batch size: 16 | lm loss: 6.645846E+00 | loss scale: 32768.0 | grad norm: 143915.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2687/ 159576 | consumed samples: 42992 | elapsed time per iteration (ms): 13579.9 | learning rate: 1.191E-05 | global batch size: 16 | lm loss: 6.852206E+00 | loss scale: 32768.0 | grad norm: 206032.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2688/ 159576 | consumed samples: 43008 | elapsed time per iteration (ms): 13654.7 | learning rate: 1.192E-05 | global batch size: 16 | lm loss: 6.708066E+00 | loss scale: 32768.0 | grad norm: 135547.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2689/ 159576 | consumed samples: 43024 | elapsed time per iteration (ms): 13756.9 | learning rate: 1.192E-05 | global batch size: 16 | lm loss: 6.627333E+00 | loss scale: 32768.0 | grad norm: 103806.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2690/ 159576 | consumed samples: 43040 | elapsed time per iteration (ms): 13560.8 | learning rate: 1.192E-05 | global batch size: 16 | lm loss: 6.624159E+00 | loss scale: 32768.0 | grad norm: 204724.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2691/ 159576 | consumed samples: 43056 | elapsed time per iteration (ms): 13656.6 | learning rate: 1.193E-05 | global batch size: 16 | lm loss: 6.803893E+00 | loss scale: 32768.0 | grad norm: 123248.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2692/ 159576 | consumed samples: 43072 | elapsed time per iteration (ms): 13672.9 | learning rate: 1.193E-05 | global batch size: 16 | lm loss: 6.801785E+00 | loss scale: 32768.0 | grad norm: 140785.815 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2693/ 159576 | consumed samples: 43088 | elapsed time per iteration (ms): 14015.4 | learning rate: 1.194E-05 | global batch size: 16 | lm loss: 6.464381E+00 | loss scale: 32768.0 | grad norm: 131615.707 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2694/ 159576 | consumed samples: 43104 | elapsed time per iteration (ms): 13588.1 | learning rate: 1.194E-05 | global batch size: 16 | lm loss: 6.727094E+00 | loss scale: 32768.0 | grad norm: 213544.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2695/ 159576 | consumed samples: 43120 | elapsed time per iteration (ms): 13608.1 | learning rate: 1.195E-05 | global batch size: 16 | lm loss: 6.930735E+00 | loss scale: 32768.0 | grad norm: 179180.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2696/ 159576 | consumed samples: 43136 | elapsed time per iteration (ms): 13594.8 | learning rate: 1.195E-05 | global batch size: 16 | lm loss: 6.652137E+00 | loss scale: 32768.0 | grad norm: 171091.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2697/ 159576 | consumed samples: 43152 | elapsed time per iteration (ms): 13943.3 | learning rate: 1.196E-05 | global batch size: 16 | lm loss: 6.731685E+00 | loss scale: 32768.0 | grad norm: 151811.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2698/ 159576 | consumed samples: 43168 | elapsed time per iteration (ms): 13773.1 | learning rate: 1.196E-05 | global batch size: 16 | lm loss: 7.081783E+00 | loss scale: 32768.0 | grad norm: 132367.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2699/ 159576 | consumed samples: 43184 | elapsed time per iteration (ms): 13644.6 | learning rate: 1.196E-05 | global batch size: 16 | lm loss: 6.806893E+00 | loss scale: 32768.0 | grad norm: 319459.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2700/ 159576 | consumed samples: 43200 | elapsed time per iteration (ms): 13698.5 | learning rate: 1.197E-05 | global batch size: 16 | lm loss: 6.666497E+00 | loss scale: 32768.0 | grad norm: 120927.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2701/ 159576 | consumed samples: 43216 | elapsed time per iteration (ms): 13684.8 | learning rate: 1.197E-05 | global batch size: 16 | lm loss: 6.701412E+00 | loss scale: 32768.0 | grad norm: 150633.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2702/ 159576 | consumed samples: 43232 | elapsed time per iteration (ms): 13780.3 | learning rate: 1.198E-05 | global batch size: 16 | lm loss: 6.594296E+00 | loss scale: 32768.0 | grad norm: 161110.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2703/ 159576 | consumed samples: 43248 | elapsed time per iteration (ms): 13593.9 | learning rate: 1.198E-05 | global batch size: 16 | lm loss: 6.808178E+00 | loss scale: 32768.0 | grad norm: 258358.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2704/ 159576 | consumed samples: 43264 | elapsed time per iteration (ms): 13635.4 | learning rate: 1.199E-05 | global batch size: 16 | lm loss: 6.815506E+00 | loss scale: 32768.0 | grad norm: 183028.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2705/ 159576 | consumed samples: 43280 | elapsed time per iteration (ms): 13605.1 | learning rate: 1.199E-05 | global batch size: 16 | lm loss: 6.967249E+00 | loss scale: 32768.0 | grad norm: 243583.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2706/ 159576 | consumed samples: 43296 | elapsed time per iteration (ms): 14130.1 | learning rate: 1.200E-05 | global batch size: 16 | lm loss: 7.062543E+00 | loss scale: 32768.0 | grad norm: 207737.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2707/ 159576 | consumed samples: 43312 | elapsed time per iteration (ms): 13561.8 | learning rate: 1.200E-05 | global batch size: 16 | lm loss: 6.758321E+00 | loss scale: 32768.0 | grad norm: 146527.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2708/ 159576 | consumed samples: 43328 | elapsed time per iteration (ms): 13722.0 | learning rate: 1.200E-05 | global batch size: 16 | lm loss: 6.584868E+00 | loss scale: 32768.0 | grad norm: 272015.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2709/ 159576 | consumed samples: 43344 | elapsed time per iteration (ms): 13654.1 | learning rate: 1.201E-05 | global batch size: 16 | lm loss: 6.709559E+00 | loss scale: 32768.0 | grad norm: 284012.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2710/ 159576 | consumed samples: 43360 | elapsed time per iteration (ms): 13595.7 | learning rate: 1.201E-05 | global batch size: 16 | lm loss: 6.830414E+00 | loss scale: 32768.0 | grad norm: 149403.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2711/ 159576 | consumed samples: 43376 | elapsed time per iteration (ms): 13973.4 | learning rate: 1.202E-05 | global batch size: 16 | lm loss: 6.624958E+00 | loss scale: 32768.0 | grad norm: 146777.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2712/ 159576 | consumed samples: 43392 | elapsed time per iteration (ms): 13700.0 | learning rate: 1.202E-05 | global batch size: 16 | lm loss: 6.735670E+00 | loss scale: 32768.0 | grad norm: 136631.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2713/ 159576 | consumed samples: 43408 | elapsed time per iteration (ms): 13572.3 | learning rate: 1.203E-05 | global batch size: 16 | lm loss: 6.765169E+00 | loss scale: 32768.0 | grad norm: 280479.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2714/ 159576 | consumed samples: 43424 | elapsed time per iteration (ms): 13642.4 | learning rate: 1.203E-05 | global batch size: 16 | lm loss: 6.622662E+00 | loss scale: 32768.0 | grad norm: 160875.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2715/ 159576 | consumed samples: 43440 | elapsed time per iteration (ms): 14122.3 | learning rate: 1.204E-05 | global batch size: 16 | lm loss: 6.730956E+00 | loss scale: 32768.0 | grad norm: 206409.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2716/ 159576 | consumed samples: 43456 | elapsed time per iteration (ms): 13831.1 | learning rate: 1.204E-05 | global batch size: 16 | lm loss: 6.767645E+00 | loss scale: 32768.0 | grad norm: 149352.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2717/ 159576 | consumed samples: 43472 | elapsed time per iteration (ms): 13572.9 | learning rate: 1.204E-05 | global batch size: 16 | lm loss: 6.975914E+00 | loss scale: 32768.0 | grad norm: 119850.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2718/ 159576 | consumed samples: 43488 | elapsed time per iteration (ms): 13686.9 | learning rate: 1.205E-05 | global batch size: 16 | lm loss: 6.919794E+00 | loss scale: 32768.0 | grad norm: 172348.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2719/ 159576 | consumed samples: 43504 | elapsed time per iteration (ms): 13976.8 | learning rate: 1.205E-05 | global batch size: 16 | lm loss: 6.652202E+00 | loss scale: 32768.0 | grad norm: 178184.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2720/ 159576 | consumed samples: 43520 | elapsed time per iteration (ms): 13571.8 | learning rate: 1.206E-05 | global batch size: 16 | lm loss: 6.787558E+00 | loss scale: 32768.0 | grad norm: 130225.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2721/ 159576 | consumed samples: 43536 | elapsed time per iteration (ms): 13693.7 | learning rate: 1.206E-05 | global batch size: 16 | lm loss: 6.660249E+00 | loss scale: 32768.0 | grad norm: 144428.996 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2722/ 159576 | consumed samples: 43552 | elapsed time per iteration (ms): 13646.9 | learning rate: 1.207E-05 | global batch size: 16 | lm loss: 6.661267E+00 | loss scale: 32768.0 | grad norm: 121995.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2723/ 159576 | consumed samples: 43568 | elapsed time per iteration (ms): 13718.1 | learning rate: 1.207E-05 | global batch size: 16 | lm loss: 6.702977E+00 | loss scale: 32768.0 | grad norm: 205375.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2724/ 159576 | consumed samples: 43584 | elapsed time per iteration (ms): 14072.2 | learning rate: 1.208E-05 | global batch size: 16 | lm loss: 6.859900E+00 | loss scale: 32768.0 | grad norm: 174185.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2725/ 159576 | consumed samples: 43600 | elapsed time per iteration (ms): 13643.1 | learning rate: 1.208E-05 | global batch size: 16 | lm loss: 6.642687E+00 | loss scale: 32768.0 | grad norm: 124356.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2726/ 159576 | consumed samples: 43616 | elapsed time per iteration (ms): 13637.6 | learning rate: 1.208E-05 | global batch size: 16 | lm loss: 6.849540E+00 | loss scale: 32768.0 | grad norm: 187912.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2727/ 159576 | consumed samples: 43632 | elapsed time per iteration (ms): 13570.5 | learning rate: 1.209E-05 | global batch size: 16 | lm loss: 6.505477E+00 | loss scale: 32768.0 | grad norm: 146429.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2728/ 159576 | consumed samples: 43648 | elapsed time per iteration (ms): 14179.1 | learning rate: 1.209E-05 | global batch size: 16 | lm loss: 6.763928E+00 | loss scale: 32768.0 | grad norm: 143016.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2729/ 159576 | consumed samples: 43664 | elapsed time per iteration (ms): 13666.5 | learning rate: 1.210E-05 | global batch size: 16 | lm loss: 6.746594E+00 | loss scale: 32768.0 | grad norm: 184649.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2730/ 159576 | consumed samples: 43680 | elapsed time per iteration (ms): 13666.9 | learning rate: 1.210E-05 | global batch size: 16 | lm loss: 6.822509E+00 | loss scale: 32768.0 | grad norm: 258599.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2731/ 159576 | consumed samples: 43696 | elapsed time per iteration (ms): 13722.5 | learning rate: 1.211E-05 | global batch size: 16 | lm loss: 6.726813E+00 | loss scale: 32768.0 | grad norm: 135253.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2732/ 159576 | consumed samples: 43712 | elapsed time per iteration (ms): 14110.6 | learning rate: 1.211E-05 | global batch size: 16 | lm loss: 6.642574E+00 | loss scale: 32768.0 | grad norm: 187051.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2733/ 159576 | consumed samples: 43728 | elapsed time per iteration (ms): 13665.7 | learning rate: 1.212E-05 | global batch size: 16 | lm loss: 6.608624E+00 | loss scale: 32768.0 | grad norm: 164163.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2734/ 159576 | consumed samples: 43744 | elapsed time per iteration (ms): 13624.6 | learning rate: 1.212E-05 | global batch size: 16 | lm loss: 6.755674E+00 | loss scale: 32768.0 | grad norm: 129230.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2735/ 159576 | consumed samples: 43760 | elapsed time per iteration (ms): 13617.1 | learning rate: 1.212E-05 | global batch size: 16 | lm loss: 6.771841E+00 | loss scale: 32768.0 | grad norm: 254766.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2736/ 159576 | consumed samples: 43776 | elapsed time per iteration (ms): 13675.3 | learning rate: 1.213E-05 | global batch size: 16 | lm loss: 6.677852E+00 | loss scale: 32768.0 | grad norm: 142644.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2737/ 159576 | consumed samples: 43792 | elapsed time per iteration (ms): 13983.3 | learning rate: 1.213E-05 | global batch size: 16 | lm loss: 6.719501E+00 | loss scale: 32768.0 | grad norm: 164953.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2738/ 159576 | consumed samples: 43808 | elapsed time per iteration (ms): 13774.1 | learning rate: 1.214E-05 | global batch size: 16 | lm loss: 6.637510E+00 | loss scale: 32768.0 | grad norm: 161949.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2739/ 159576 | consumed samples: 43824 | elapsed time per iteration (ms): 13780.8 | learning rate: 1.214E-05 | global batch size: 16 | lm loss: 6.670253E+00 | loss scale: 32768.0 | grad norm: 132053.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2740/ 159576 | consumed samples: 43840 | elapsed time per iteration (ms): 13656.5 | learning rate: 1.215E-05 | global batch size: 16 | lm loss: 6.701370E+00 | loss scale: 32768.0 | grad norm: 158609.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2741/ 159576 | consumed samples: 43856 | elapsed time per iteration (ms): 13970.4 | learning rate: 1.215E-05 | global batch size: 16 | lm loss: 6.676120E+00 | loss scale: 32768.0 | grad norm: 133079.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2742/ 159576 | consumed samples: 43872 | elapsed time per iteration (ms): 13572.9 | learning rate: 1.216E-05 | global batch size: 16 | lm loss: 6.666083E+00 | loss scale: 32768.0 | grad norm: 121076.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2743/ 159576 | consumed samples: 43888 | elapsed time per iteration (ms): 13635.9 | learning rate: 1.216E-05 | global batch size: 16 | lm loss: 6.594894E+00 | loss scale: 32768.0 | grad norm: 206897.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2744/ 159576 | consumed samples: 43904 | elapsed time per iteration (ms): 13681.8 | learning rate: 1.216E-05 | global batch size: 16 | lm loss: 6.700480E+00 | loss scale: 32768.0 | grad norm: 126037.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2745/ 159576 | consumed samples: 43920 | elapsed time per iteration (ms): 13966.9 | learning rate: 1.217E-05 | global batch size: 16 | lm loss: 6.708483E+00 | loss scale: 32768.0 | grad norm: 136172.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2746/ 159576 | consumed samples: 43936 | elapsed time per iteration (ms): 13758.4 | learning rate: 1.217E-05 | global batch size: 16 | lm loss: 6.629419E+00 | loss scale: 32768.0 | grad norm: 142570.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2747/ 159576 | consumed samples: 43952 | elapsed time per iteration (ms): 13668.5 | learning rate: 1.218E-05 | global batch size: 16 | lm loss: 6.597517E+00 | loss scale: 32768.0 | grad norm: 155237.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2748/ 159576 | consumed samples: 43968 | elapsed time per iteration (ms): 13633.2 | learning rate: 1.218E-05 | global batch size: 16 | lm loss: 6.561327E+00 | loss scale: 32768.0 | grad norm: 162642.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2749/ 159576 | consumed samples: 43984 | elapsed time per iteration (ms): 13608.4 | learning rate: 1.219E-05 | global batch size: 16 | lm loss: 6.677460E+00 | loss scale: 32768.0 | grad norm: 192650.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2750/ 159576 | consumed samples: 44000 | elapsed time per iteration (ms): 13886.7 | learning rate: 1.219E-05 | global batch size: 16 | lm loss: 6.649335E+00 | loss scale: 32768.0 | grad norm: 171673.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2751/ 159576 | consumed samples: 44016 | elapsed time per iteration (ms): 13671.6 | learning rate: 1.220E-05 | global batch size: 16 | lm loss: 6.735415E+00 | loss scale: 32768.0 | grad norm: 128822.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2752/ 159576 | consumed samples: 44032 | elapsed time per iteration (ms): 13708.1 | learning rate: 1.220E-05 | global batch size: 16 | lm loss: 6.679979E+00 | loss scale: 32768.0 | grad norm: 253310.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2753/ 159576 | consumed samples: 44048 | elapsed time per iteration (ms): 13770.7 | learning rate: 1.220E-05 | global batch size: 16 | lm loss: 6.565764E+00 | loss scale: 32768.0 | grad norm: 116179.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2754/ 159576 | consumed samples: 44064 | elapsed time per iteration (ms): 14066.6 | learning rate: 1.221E-05 | global batch size: 16 | lm loss: 6.742185E+00 | loss scale: 32768.0 | grad norm: 141403.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2755/ 159576 | consumed samples: 44080 | elapsed time per iteration (ms): 13651.8 | learning rate: 1.221E-05 | global batch size: 16 | lm loss: 6.762599E+00 | loss scale: 32768.0 | grad norm: 111172.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2756/ 159576 | consumed samples: 44096 | elapsed time per iteration (ms): 13694.5 | learning rate: 1.222E-05 | global batch size: 16 | lm loss: 6.733878E+00 | loss scale: 32768.0 | grad norm: 128168.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2757/ 159576 | consumed samples: 44112 | elapsed time per iteration (ms): 13604.8 | learning rate: 1.222E-05 | global batch size: 16 | lm loss: 6.588708E+00 | loss scale: 32768.0 | grad norm: 103022.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2758/ 159576 | consumed samples: 44128 | elapsed time per iteration (ms): 13653.9 | learning rate: 1.223E-05 | global batch size: 16 | lm loss: 6.562719E+00 | loss scale: 32768.0 | grad norm: 138192.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2759/ 159576 | consumed samples: 44144 | elapsed time per iteration (ms): 13986.1 | learning rate: 1.223E-05 | global batch size: 16 | lm loss: 6.738625E+00 | loss scale: 32768.0 | grad norm: 121839.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2760/ 159576 | consumed samples: 44160 | elapsed time per iteration (ms): 13725.3 | learning rate: 1.224E-05 | global batch size: 16 | lm loss: 6.566117E+00 | loss scale: 32768.0 | grad norm: 104901.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2761/ 159576 | consumed samples: 44176 | elapsed time per iteration (ms): 13770.1 | learning rate: 1.224E-05 | global batch size: 16 | lm loss: 6.666871E+00 | loss scale: 32768.0 | grad norm: 123398.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2762/ 159576 | consumed samples: 44192 | elapsed time per iteration (ms): 13627.5 | learning rate: 1.224E-05 | global batch size: 16 | lm loss: 6.835371E+00 | loss scale: 32768.0 | grad norm: 112214.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2763/ 159576 | consumed samples: 44208 | elapsed time per iteration (ms): 14068.3 | learning rate: 1.225E-05 | global batch size: 16 | lm loss: 6.804303E+00 | loss scale: 32768.0 | grad norm: 122506.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2764/ 159576 | consumed samples: 44224 | elapsed time per iteration (ms): 6917.6 | learning rate: 1.225E-05 | global batch size: 16 | lm loss: 6.972560E+00 | loss scale: 16384.0 | grad norm: 122506.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2765/ 159576 | consumed samples: 44240 | elapsed time per iteration (ms): 13181.9 | learning rate: 1.225E-05 | global batch size: 16 | lm loss: 6.580292E+00 | loss scale: 16384.0 | grad norm: 59992.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2766/ 159576 | consumed samples: 44256 | elapsed time per iteration (ms): 13680.1 | learning rate: 1.226E-05 | global batch size: 16 | lm loss: 6.724333E+00 | loss scale: 16384.0 | grad norm: 77015.113 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2767/ 159576 | consumed samples: 44272 | elapsed time per iteration (ms): 13716.6 | learning rate: 1.226E-05 | global batch size: 16 | lm loss: 6.933354E+00 | loss scale: 16384.0 | grad norm: 85522.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2768/ 159576 | consumed samples: 44288 | elapsed time per iteration (ms): 13994.0 | learning rate: 1.227E-05 | global batch size: 16 | lm loss: 6.648163E+00 | loss scale: 16384.0 | grad norm: 58295.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2769/ 159576 | consumed samples: 44304 | elapsed time per iteration (ms): 13658.9 | learning rate: 1.227E-05 | global batch size: 16 | lm loss: 6.891530E+00 | loss scale: 16384.0 | grad norm: 75446.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2770/ 159576 | consumed samples: 44320 | elapsed time per iteration (ms): 13703.7 | learning rate: 1.228E-05 | global batch size: 16 | lm loss: 6.591332E+00 | loss scale: 16384.0 | grad norm: 59290.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2771/ 159576 | consumed samples: 44336 | elapsed time per iteration (ms): 13716.9 | learning rate: 1.228E-05 | global batch size: 16 | lm loss: 6.737020E+00 | loss scale: 16384.0 | grad norm: 51929.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2772/ 159576 | consumed samples: 44352 | elapsed time per iteration (ms): 14010.7 | learning rate: 1.228E-05 | global batch size: 16 | lm loss: 6.565439E+00 | loss scale: 16384.0 | grad norm: 100304.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2773/ 159576 | consumed samples: 44368 | elapsed time per iteration (ms): 13566.2 | learning rate: 1.229E-05 | global batch size: 16 | lm loss: 6.887408E+00 | loss scale: 16384.0 | grad norm: 86699.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2774/ 159576 | consumed samples: 44384 | elapsed time per iteration (ms): 13639.1 | learning rate: 1.229E-05 | global batch size: 16 | lm loss: 6.766156E+00 | loss scale: 16384.0 | grad norm: 64840.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2775/ 159576 | consumed samples: 44400 | elapsed time per iteration (ms): 13646.1 | learning rate: 1.230E-05 | global batch size: 16 | lm loss: 6.640082E+00 | loss scale: 16384.0 | grad norm: 61943.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2776/ 159576 | consumed samples: 44416 | elapsed time per iteration (ms): 13670.4 | learning rate: 1.230E-05 | global batch size: 16 | lm loss: 6.784959E+00 | loss scale: 16384.0 | grad norm: 68978.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2777/ 159576 | consumed samples: 44432 | elapsed time per iteration (ms): 14012.8 | learning rate: 1.231E-05 | global batch size: 16 | lm loss: 6.670368E+00 | loss scale: 16384.0 | grad norm: 58668.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2778/ 159576 | consumed samples: 44448 | elapsed time per iteration (ms): 13651.5 | learning rate: 1.231E-05 | global batch size: 16 | lm loss: 6.849538E+00 | loss scale: 16384.0 | grad norm: 53539.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2779/ 159576 | consumed samples: 44464 | elapsed time per iteration (ms): 13531.1 | learning rate: 1.232E-05 | global batch size: 16 | lm loss: 6.710807E+00 | loss scale: 16384.0 | grad norm: 58047.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2780/ 159576 | consumed samples: 44480 | elapsed time per iteration (ms): 13601.2 | learning rate: 1.232E-05 | global batch size: 16 | lm loss: 6.803576E+00 | loss scale: 16384.0 | grad norm: 61014.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2781/ 159576 | consumed samples: 44496 | elapsed time per iteration (ms): 14011.6 | learning rate: 1.232E-05 | global batch size: 16 | lm loss: 6.435648E+00 | loss scale: 16384.0 | grad norm: 72928.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2782/ 159576 | consumed samples: 44512 | elapsed time per iteration (ms): 13706.9 | learning rate: 1.233E-05 | global batch size: 16 | lm loss: 6.689322E+00 | loss scale: 16384.0 | grad norm: 45124.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2783/ 159576 | consumed samples: 44528 | elapsed time per iteration (ms): 13638.0 | learning rate: 1.233E-05 | global batch size: 16 | lm loss: 6.796506E+00 | loss scale: 16384.0 | grad norm: 61254.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2784/ 159576 | consumed samples: 44544 | elapsed time per iteration (ms): 13617.3 | learning rate: 1.234E-05 | global batch size: 16 | lm loss: 6.726316E+00 | loss scale: 16384.0 | grad norm: 58102.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2785/ 159576 | consumed samples: 44560 | elapsed time per iteration (ms): 13946.8 | learning rate: 1.234E-05 | global batch size: 16 | lm loss: 6.648038E+00 | loss scale: 16384.0 | grad norm: 68282.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2786/ 159576 | consumed samples: 44576 | elapsed time per iteration (ms): 13594.9 | learning rate: 1.235E-05 | global batch size: 16 | lm loss: 6.860110E+00 | loss scale: 16384.0 | grad norm: 70475.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2787/ 159576 | consumed samples: 44592 | elapsed time per iteration (ms): 13607.8 | learning rate: 1.235E-05 | global batch size: 16 | lm loss: 6.821939E+00 | loss scale: 16384.0 | grad norm: 56499.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2788/ 159576 | consumed samples: 44608 | elapsed time per iteration (ms): 13592.1 | learning rate: 1.236E-05 | global batch size: 16 | lm loss: 6.702363E+00 | loss scale: 16384.0 | grad norm: 71878.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2789/ 159576 | consumed samples: 44624 | elapsed time per iteration (ms): 13633.0 | learning rate: 1.236E-05 | global batch size: 16 | lm loss: 6.596258E+00 | loss scale: 16384.0 | grad norm: 57167.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2790/ 159576 | consumed samples: 44640 | elapsed time per iteration (ms): 13806.2 | learning rate: 1.236E-05 | global batch size: 16 | lm loss: 6.742100E+00 | loss scale: 16384.0 | grad norm: 78591.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2791/ 159576 | consumed samples: 44656 | elapsed time per iteration (ms): 13659.4 | learning rate: 1.237E-05 | global batch size: 16 | lm loss: 6.602869E+00 | loss scale: 16384.0 | grad norm: 68726.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2792/ 159576 | consumed samples: 44672 | elapsed time per iteration (ms): 13592.2 | learning rate: 1.237E-05 | global batch size: 16 | lm loss: 6.708993E+00 | loss scale: 16384.0 | grad norm: 98214.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2793/ 159576 | consumed samples: 44688 | elapsed time per iteration (ms): 13507.3 | learning rate: 1.238E-05 | global batch size: 16 | lm loss: 6.616965E+00 | loss scale: 16384.0 | grad norm: 72150.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2794/ 159576 | consumed samples: 44704 | elapsed time per iteration (ms): 13955.1 | learning rate: 1.238E-05 | global batch size: 16 | lm loss: 6.607640E+00 | loss scale: 16384.0 | grad norm: 62728.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2795/ 159576 | consumed samples: 44720 | elapsed time per iteration (ms): 13531.1 | learning rate: 1.239E-05 | global batch size: 16 | lm loss: 6.875388E+00 | loss scale: 16384.0 | grad norm: 94768.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2796/ 159576 | consumed samples: 44736 | elapsed time per iteration (ms): 13614.2 | learning rate: 1.239E-05 | global batch size: 16 | lm loss: 6.827682E+00 | loss scale: 16384.0 | grad norm: 59818.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2797/ 159576 | consumed samples: 44752 | elapsed time per iteration (ms): 13620.6 | learning rate: 1.239E-05 | global batch size: 16 | lm loss: 6.522869E+00 | loss scale: 16384.0 | grad norm: 74009.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2798/ 159576 | consumed samples: 44768 | elapsed time per iteration (ms): 13985.4 | learning rate: 1.240E-05 | global batch size: 16 | lm loss: 6.654684E+00 | loss scale: 16384.0 | grad norm: 54913.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2799/ 159576 | consumed samples: 44784 | elapsed time per iteration (ms): 13759.4 | learning rate: 1.240E-05 | global batch size: 16 | lm loss: 6.544140E+00 | loss scale: 16384.0 | grad norm: 83654.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2800/ 159576 | consumed samples: 44800 | elapsed time per iteration (ms): 13524.0 | learning rate: 1.241E-05 | global batch size: 16 | lm loss: 6.798269E+00 | loss scale: 16384.0 | grad norm: 80678.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2801/ 159576 | consumed samples: 44816 | elapsed time per iteration (ms): 13646.5 | learning rate: 1.241E-05 | global batch size: 16 | lm loss: 6.872281E+00 | loss scale: 16384.0 | grad norm: 49084.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2802/ 159576 | consumed samples: 44832 | elapsed time per iteration (ms): 13614.0 | learning rate: 1.242E-05 | global batch size: 16 | lm loss: 6.733764E+00 | loss scale: 16384.0 | grad norm: 88585.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2803/ 159576 | consumed samples: 44848 | elapsed time per iteration (ms): 13792.4 | learning rate: 1.242E-05 | global batch size: 16 | lm loss: 6.865559E+00 | loss scale: 16384.0 | grad norm: 48186.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2804/ 159576 | consumed samples: 44864 | elapsed time per iteration (ms): 13655.0 | learning rate: 1.243E-05 | global batch size: 16 | lm loss: 6.631515E+00 | loss scale: 16384.0 | grad norm: 66281.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2805/ 159576 | consumed samples: 44880 | elapsed time per iteration (ms): 13605.4 | learning rate: 1.243E-05 | global batch size: 16 | lm loss: 6.593436E+00 | loss scale: 16384.0 | grad norm: 66274.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2806/ 159576 | consumed samples: 44896 | elapsed time per iteration (ms): 13611.6 | learning rate: 1.243E-05 | global batch size: 16 | lm loss: 6.692297E+00 | loss scale: 16384.0 | grad norm: 66535.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2807/ 159576 | consumed samples: 44912 | elapsed time per iteration (ms): 13924.4 | learning rate: 1.244E-05 | global batch size: 16 | lm loss: 6.564488E+00 | loss scale: 16384.0 | grad norm: 62289.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2808/ 159576 | consumed samples: 44928 | elapsed time per iteration (ms): 13559.5 | learning rate: 1.244E-05 | global batch size: 16 | lm loss: 6.775381E+00 | loss scale: 16384.0 | grad norm: 51114.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2809/ 159576 | consumed samples: 44944 | elapsed time per iteration (ms): 13579.6 | learning rate: 1.245E-05 | global batch size: 16 | lm loss: 6.854599E+00 | loss scale: 16384.0 | grad norm: 78574.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2810/ 159576 | consumed samples: 44960 | elapsed time per iteration (ms): 13568.8 | learning rate: 1.245E-05 | global batch size: 16 | lm loss: 6.641658E+00 | loss scale: 16384.0 | grad norm: 48054.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2811/ 159576 | consumed samples: 44976 | elapsed time per iteration (ms): 13577.2 | learning rate: 1.246E-05 | global batch size: 16 | lm loss: 6.804714E+00 | loss scale: 16384.0 | grad norm: 85293.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2812/ 159576 | consumed samples: 44992 | elapsed time per iteration (ms): 13780.4 | learning rate: 1.246E-05 | global batch size: 16 | lm loss: 6.484572E+00 | loss scale: 16384.0 | grad norm: 54599.094 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2813/ 159576 | consumed samples: 45008 | elapsed time per iteration (ms): 13630.2 | learning rate: 1.247E-05 | global batch size: 16 | lm loss: 6.495656E+00 | loss scale: 16384.0 | grad norm: 131722.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2814/ 159576 | consumed samples: 45024 | elapsed time per iteration (ms): 13626.8 | learning rate: 1.247E-05 | global batch size: 16 | lm loss: 6.894939E+00 | loss scale: 16384.0 | grad norm: 102881.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2815/ 159576 | consumed samples: 45040 | elapsed time per iteration (ms): 13599.0 | learning rate: 1.247E-05 | global batch size: 16 | lm loss: 6.883965E+00 | loss scale: 16384.0 | grad norm: 72100.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2816/ 159576 | consumed samples: 45056 | elapsed time per iteration (ms): 14052.1 | learning rate: 1.248E-05 | global batch size: 16 | lm loss: 6.573022E+00 | loss scale: 16384.0 | grad norm: 72968.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2817/ 159576 | consumed samples: 45072 | elapsed time per iteration (ms): 13541.1 | learning rate: 1.248E-05 | global batch size: 16 | lm loss: 6.646833E+00 | loss scale: 16384.0 | grad norm: 90510.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2818/ 159576 | consumed samples: 45088 | elapsed time per iteration (ms): 13597.7 | learning rate: 1.249E-05 | global batch size: 16 | lm loss: 6.898618E+00 | loss scale: 16384.0 | grad norm: 90037.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2819/ 159576 | consumed samples: 45104 | elapsed time per iteration (ms): 13575.0 | learning rate: 1.249E-05 | global batch size: 16 | lm loss: 6.547668E+00 | loss scale: 16384.0 | grad norm: 79277.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2820/ 159576 | consumed samples: 45120 | elapsed time per iteration (ms): 14016.3 | learning rate: 1.250E-05 | global batch size: 16 | lm loss: 6.791230E+00 | loss scale: 16384.0 | grad norm: 63437.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2821/ 159576 | consumed samples: 45136 | elapsed time per iteration (ms): 13565.5 | learning rate: 1.250E-05 | global batch size: 16 | lm loss: 6.957808E+00 | loss scale: 16384.0 | grad norm: 56738.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2822/ 159576 | consumed samples: 45152 | elapsed time per iteration (ms): 13564.0 | learning rate: 1.251E-05 | global batch size: 16 | lm loss: 6.729958E+00 | loss scale: 16384.0 | grad norm: 93778.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2823/ 159576 | consumed samples: 45168 | elapsed time per iteration (ms): 13650.0 | learning rate: 1.251E-05 | global batch size: 16 | lm loss: 6.480144E+00 | loss scale: 16384.0 | grad norm: 60246.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2824/ 159576 | consumed samples: 45184 | elapsed time per iteration (ms): 13511.5 | learning rate: 1.251E-05 | global batch size: 16 | lm loss: 6.595847E+00 | loss scale: 16384.0 | grad norm: 63557.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2825/ 159576 | consumed samples: 45200 | elapsed time per iteration (ms): 13655.5 | learning rate: 1.252E-05 | global batch size: 16 | lm loss: 6.689149E+00 | loss scale: 16384.0 | grad norm: 67372.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2826/ 159576 | consumed samples: 45216 | elapsed time per iteration (ms): 13638.0 | learning rate: 1.252E-05 | global batch size: 16 | lm loss: 6.689507E+00 | loss scale: 16384.0 | grad norm: 69124.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2827/ 159576 | consumed samples: 45232 | elapsed time per iteration (ms): 13546.1 | learning rate: 1.253E-05 | global batch size: 16 | lm loss: 6.457958E+00 | loss scale: 16384.0 | grad norm: 56160.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2828/ 159576 | consumed samples: 45248 | elapsed time per iteration (ms): 13610.9 | learning rate: 1.253E-05 | global batch size: 16 | lm loss: 6.815155E+00 | loss scale: 16384.0 | grad norm: 61009.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2829/ 159576 | consumed samples: 45264 | elapsed time per iteration (ms): 13930.1 | learning rate: 1.254E-05 | global batch size: 16 | lm loss: 6.595886E+00 | loss scale: 16384.0 | grad norm: 45906.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2830/ 159576 | consumed samples: 45280 | elapsed time per iteration (ms): 13608.1 | learning rate: 1.254E-05 | global batch size: 16 | lm loss: 6.642846E+00 | loss scale: 16384.0 | grad norm: 74796.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2831/ 159576 | consumed samples: 45296 | elapsed time per iteration (ms): 13539.5 | learning rate: 1.255E-05 | global batch size: 16 | lm loss: 6.810493E+00 | loss scale: 16384.0 | grad norm: 64536.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2832/ 159576 | consumed samples: 45312 | elapsed time per iteration (ms): 13571.9 | learning rate: 1.255E-05 | global batch size: 16 | lm loss: 6.742997E+00 | loss scale: 16384.0 | grad norm: 54697.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2833/ 159576 | consumed samples: 45328 | elapsed time per iteration (ms): 13626.3 | learning rate: 1.255E-05 | global batch size: 16 | lm loss: 6.734198E+00 | loss scale: 16384.0 | grad norm: 124253.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2834/ 159576 | consumed samples: 45344 | elapsed time per iteration (ms): 13905.8 | learning rate: 1.256E-05 | global batch size: 16 | lm loss: 6.652834E+00 | loss scale: 16384.0 | grad norm: 71659.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 13:06:39] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 13:06:39] PULSE: tr8-104B is running for 7:14:28 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 2835/ 159576 | consumed samples: 45360 | elapsed time per iteration (ms): 13609.5 | learning rate: 1.256E-05 | global batch size: 16 | lm loss: 6.789959E+00 | loss scale: 16384.0 | grad norm: 73488.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2836/ 159576 | consumed samples: 45376 | elapsed time per iteration (ms): 13614.7 | learning rate: 1.257E-05 | global batch size: 16 | lm loss: 6.695529E+00 | loss scale: 16384.0 | grad norm: 69307.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2837/ 159576 | consumed samples: 45392 | elapsed time per iteration (ms): 13634.1 | learning rate: 1.257E-05 | global batch size: 16 | lm loss: 6.550642E+00 | loss scale: 16384.0 | grad norm: 88157.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2838/ 159576 | consumed samples: 45408 | elapsed time per iteration (ms): 14029.3 | learning rate: 1.258E-05 | global batch size: 16 | lm loss: 6.745864E+00 | loss scale: 16384.0 | grad norm: 79032.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2839/ 159576 | consumed samples: 45424 | elapsed time per iteration (ms): 13631.7 | learning rate: 1.258E-05 | global batch size: 16 | lm loss: 7.013217E+00 | loss scale: 16384.0 | grad norm: 90598.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2840/ 159576 | consumed samples: 45440 | elapsed time per iteration (ms): 13552.2 | learning rate: 1.259E-05 | global batch size: 16 | lm loss: 6.791473E+00 | loss scale: 16384.0 | grad norm: 66761.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2841/ 159576 | consumed samples: 45456 | elapsed time per iteration (ms): 13585.4 | learning rate: 1.259E-05 | global batch size: 16 | lm loss: 6.639102E+00 | loss scale: 16384.0 | grad norm: 75945.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2842/ 159576 | consumed samples: 45472 | elapsed time per iteration (ms): 14005.5 | learning rate: 1.259E-05 | global batch size: 16 | lm loss: 6.750570E+00 | loss scale: 16384.0 | grad norm: 52422.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2843/ 159576 | consumed samples: 45488 | elapsed time per iteration (ms): 13637.6 | learning rate: 1.260E-05 | global batch size: 16 | lm loss: 6.761233E+00 | loss scale: 16384.0 | grad norm: 96201.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2844/ 159576 | consumed samples: 45504 | elapsed time per iteration (ms): 13605.0 | learning rate: 1.260E-05 | global batch size: 16 | lm loss: 6.869712E+00 | loss scale: 16384.0 | grad norm: 85259.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2845/ 159576 | consumed samples: 45520 | elapsed time per iteration (ms): 13489.6 | learning rate: 1.261E-05 | global batch size: 16 | lm loss: 6.754227E+00 | loss scale: 16384.0 | grad norm: 71430.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2846/ 159576 | consumed samples: 45536 | elapsed time per iteration (ms): 13633.0 | learning rate: 1.261E-05 | global batch size: 16 | lm loss: 6.681328E+00 | loss scale: 16384.0 | grad norm: 64498.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2847/ 159576 | consumed samples: 45552 | elapsed time per iteration (ms): 13680.5 | learning rate: 1.262E-05 | global batch size: 16 | lm loss: 6.708944E+00 | loss scale: 16384.0 | grad norm: 99300.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2848/ 159576 | consumed samples: 45568 | elapsed time per iteration (ms): 13578.9 | learning rate: 1.262E-05 | global batch size: 16 | lm loss: 6.689048E+00 | loss scale: 16384.0 | grad norm: 90482.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2849/ 159576 | consumed samples: 45584 | elapsed time per iteration (ms): 13613.6 | learning rate: 1.263E-05 | global batch size: 16 | lm loss: 6.673044E+00 | loss scale: 16384.0 | grad norm: 59461.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2850/ 159576 | consumed samples: 45600 | elapsed time per iteration (ms): 13675.0 | learning rate: 1.263E-05 | global batch size: 16 | lm loss: 6.738005E+00 | loss scale: 16384.0 | grad norm: 101125.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2851/ 159576 | consumed samples: 45616 | elapsed time per iteration (ms): 13897.5 | learning rate: 1.263E-05 | global batch size: 16 | lm loss: 6.522173E+00 | loss scale: 16384.0 | grad norm: 90321.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2852/ 159576 | consumed samples: 45632 | elapsed time per iteration (ms): 13599.3 | learning rate: 1.264E-05 | global batch size: 16 | lm loss: 6.524035E+00 | loss scale: 16384.0 | grad norm: 70117.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2853/ 159576 | consumed samples: 45648 | elapsed time per iteration (ms): 13643.7 | learning rate: 1.264E-05 | global batch size: 16 | lm loss: 6.510409E+00 | loss scale: 16384.0 | grad norm: 64993.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2854/ 159576 | consumed samples: 45664 | elapsed time per iteration (ms): 13552.1 | learning rate: 1.265E-05 | global batch size: 16 | lm loss: 6.913634E+00 | loss scale: 16384.0 | grad norm: 106101.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2855/ 159576 | consumed samples: 45680 | elapsed time per iteration (ms): 13759.3 | learning rate: 1.265E-05 | global batch size: 16 | lm loss: 6.640407E+00 | loss scale: 16384.0 | grad norm: 114581.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2856/ 159576 | consumed samples: 45696 | elapsed time per iteration (ms): 13808.3 | learning rate: 1.266E-05 | global batch size: 16 | lm loss: 6.781041E+00 | loss scale: 16384.0 | grad norm: 56604.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2857/ 159576 | consumed samples: 45712 | elapsed time per iteration (ms): 13620.2 | learning rate: 1.266E-05 | global batch size: 16 | lm loss: 6.794811E+00 | loss scale: 16384.0 | grad norm: 60150.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2858/ 159576 | consumed samples: 45728 | elapsed time per iteration (ms): 13675.9 | learning rate: 1.267E-05 | global batch size: 16 | lm loss: 6.586791E+00 | loss scale: 16384.0 | grad norm: 100786.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2859/ 159576 | consumed samples: 45744 | elapsed time per iteration (ms): 13583.4 | learning rate: 1.267E-05 | global batch size: 16 | lm loss: 6.762810E+00 | loss scale: 16384.0 | grad norm: 82968.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2860/ 159576 | consumed samples: 45760 | elapsed time per iteration (ms): 13906.7 | learning rate: 1.267E-05 | global batch size: 16 | lm loss: 6.739496E+00 | loss scale: 16384.0 | grad norm: 51306.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2861/ 159576 | consumed samples: 45776 | elapsed time per iteration (ms): 13619.1 | learning rate: 1.268E-05 | global batch size: 16 | lm loss: 6.046006E+00 | loss scale: 16384.0 | grad norm: 70726.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2862/ 159576 | consumed samples: 45792 | elapsed time per iteration (ms): 13544.2 | learning rate: 1.268E-05 | global batch size: 16 | lm loss: 6.803837E+00 | loss scale: 16384.0 | grad norm: 68740.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2863/ 159576 | consumed samples: 45808 | elapsed time per iteration (ms): 13610.8 | learning rate: 1.269E-05 | global batch size: 16 | lm loss: 6.770112E+00 | loss scale: 16384.0 | grad norm: 139814.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2864/ 159576 | consumed samples: 45824 | elapsed time per iteration (ms): 13958.0 | learning rate: 1.269E-05 | global batch size: 16 | lm loss: 6.750904E+00 | loss scale: 16384.0 | grad norm: 77621.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2865/ 159576 | consumed samples: 45840 | elapsed time per iteration (ms): 13670.7 | learning rate: 1.270E-05 | global batch size: 16 | lm loss: 6.696413E+00 | loss scale: 16384.0 | grad norm: 71170.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2866/ 159576 | consumed samples: 45856 | elapsed time per iteration (ms): 13638.6 | learning rate: 1.270E-05 | global batch size: 16 | lm loss: 6.704915E+00 | loss scale: 16384.0 | grad norm: 101640.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2867/ 159576 | consumed samples: 45872 | elapsed time per iteration (ms): 13607.2 | learning rate: 1.271E-05 | global batch size: 16 | lm loss: 6.825719E+00 | loss scale: 16384.0 | grad norm: 75740.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2868/ 159576 | consumed samples: 45888 | elapsed time per iteration (ms): 13630.4 | learning rate: 1.271E-05 | global batch size: 16 | lm loss: 6.287379E+00 | loss scale: 16384.0 | grad norm: 102389.724 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2869/ 159576 | consumed samples: 45904 | elapsed time per iteration (ms): 13745.4 | learning rate: 1.271E-05 | global batch size: 16 | lm loss: 6.541815E+00 | loss scale: 16384.0 | grad norm: 70149.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2870/ 159576 | consumed samples: 45920 | elapsed time per iteration (ms): 13607.8 | learning rate: 1.272E-05 | global batch size: 16 | lm loss: 6.516257E+00 | loss scale: 16384.0 | grad norm: 75996.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2871/ 159576 | consumed samples: 45936 | elapsed time per iteration (ms): 13612.1 | learning rate: 1.272E-05 | global batch size: 16 | lm loss: 6.478125E+00 | loss scale: 16384.0 | grad norm: 71923.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2872/ 159576 | consumed samples: 45952 | elapsed time per iteration (ms): 13608.0 | learning rate: 1.273E-05 | global batch size: 16 | lm loss: 6.691109E+00 | loss scale: 16384.0 | grad norm: 87426.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2873/ 159576 | consumed samples: 45968 | elapsed time per iteration (ms): 13976.7 | learning rate: 1.273E-05 | global batch size: 16 | lm loss: 6.620930E+00 | loss scale: 16384.0 | grad norm: 104041.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2874/ 159576 | consumed samples: 45984 | elapsed time per iteration (ms): 13607.9 | learning rate: 1.274E-05 | global batch size: 16 | lm loss: 6.744573E+00 | loss scale: 16384.0 | grad norm: 69927.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2875/ 159576 | consumed samples: 46000 | elapsed time per iteration (ms): 13661.2 | learning rate: 1.274E-05 | global batch size: 16 | lm loss: 6.676423E+00 | loss scale: 16384.0 | grad norm: 51002.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2876/ 159576 | consumed samples: 46016 | elapsed time per iteration (ms): 13531.2 | learning rate: 1.275E-05 | global batch size: 16 | lm loss: 6.802640E+00 | loss scale: 16384.0 | grad norm: 87004.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2877/ 159576 | consumed samples: 46032 | elapsed time per iteration (ms): 13901.7 | learning rate: 1.275E-05 | global batch size: 16 | lm loss: 6.729659E+00 | loss scale: 16384.0 | grad norm: 50767.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2878/ 159576 | consumed samples: 46048 | elapsed time per iteration (ms): 13702.1 | learning rate: 1.275E-05 | global batch size: 16 | lm loss: 6.922673E+00 | loss scale: 16384.0 | grad norm: 121433.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2879/ 159576 | consumed samples: 46064 | elapsed time per iteration (ms): 13605.9 | learning rate: 1.276E-05 | global batch size: 16 | lm loss: 6.701990E+00 | loss scale: 16384.0 | grad norm: 78796.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2880/ 159576 | consumed samples: 46080 | elapsed time per iteration (ms): 13615.6 | learning rate: 1.276E-05 | global batch size: 16 | lm loss: 6.650718E+00 | loss scale: 16384.0 | grad norm: 68193.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2881/ 159576 | consumed samples: 46096 | elapsed time per iteration (ms): 13595.5 | learning rate: 1.277E-05 | global batch size: 16 | lm loss: 6.732479E+00 | loss scale: 16384.0 | grad norm: 69049.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2882/ 159576 | consumed samples: 46112 | elapsed time per iteration (ms): 13888.6 | learning rate: 1.277E-05 | global batch size: 16 | lm loss: 6.563155E+00 | loss scale: 16384.0 | grad norm: 84383.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2883/ 159576 | consumed samples: 46128 | elapsed time per iteration (ms): 13560.8 | learning rate: 1.278E-05 | global batch size: 16 | lm loss: 6.406487E+00 | loss scale: 16384.0 | grad norm: 66632.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2884/ 159576 | consumed samples: 46144 | elapsed time per iteration (ms): 13502.0 | learning rate: 1.278E-05 | global batch size: 16 | lm loss: 6.748409E+00 | loss scale: 16384.0 | grad norm: 69626.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2885/ 159576 | consumed samples: 46160 | elapsed time per iteration (ms): 13526.3 | learning rate: 1.279E-05 | global batch size: 16 | lm loss: 6.474768E+00 | loss scale: 16384.0 | grad norm: 43811.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2886/ 159576 | consumed samples: 46176 | elapsed time per iteration (ms): 13863.4 | learning rate: 1.279E-05 | global batch size: 16 | lm loss: 6.661960E+00 | loss scale: 16384.0 | grad norm: 71612.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2887/ 159576 | consumed samples: 46192 | elapsed time per iteration (ms): 13578.7 | learning rate: 1.279E-05 | global batch size: 16 | lm loss: 6.511534E+00 | loss scale: 16384.0 | grad norm: 60456.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2888/ 159576 | consumed samples: 46208 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.280E-05 | global batch size: 16 | lm loss: 6.689698E+00 | loss scale: 16384.0 | grad norm: 101410.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2889/ 159576 | consumed samples: 46224 | elapsed time per iteration (ms): 13621.2 | learning rate: 1.280E-05 | global batch size: 16 | lm loss: 6.679986E+00 | loss scale: 16384.0 | grad norm: 74313.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2890/ 159576 | consumed samples: 46240 | elapsed time per iteration (ms): 13599.6 | learning rate: 1.281E-05 | global batch size: 16 | lm loss: 6.579202E+00 | loss scale: 16384.0 | grad norm: 53116.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2891/ 159576 | consumed samples: 46256 | elapsed time per iteration (ms): 13965.8 | learning rate: 1.281E-05 | global batch size: 16 | lm loss: 6.841757E+00 | loss scale: 16384.0 | grad norm: 71980.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2892/ 159576 | consumed samples: 46272 | elapsed time per iteration (ms): 13517.0 | learning rate: 1.282E-05 | global batch size: 16 | lm loss: 6.555973E+00 | loss scale: 16384.0 | grad norm: 90572.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2893/ 159576 | consumed samples: 46288 | elapsed time per iteration (ms): 13525.5 | learning rate: 1.282E-05 | global batch size: 16 | lm loss: 6.857316E+00 | loss scale: 16384.0 | grad norm: 60488.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2894/ 159576 | consumed samples: 46304 | elapsed time per iteration (ms): 13541.9 | learning rate: 1.283E-05 | global batch size: 16 | lm loss: 6.685534E+00 | loss scale: 16384.0 | grad norm: 69134.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2895/ 159576 | consumed samples: 46320 | elapsed time per iteration (ms): 14148.5 | learning rate: 1.283E-05 | global batch size: 16 | lm loss: 6.805571E+00 | loss scale: 16384.0 | grad norm: 57858.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2896/ 159576 | consumed samples: 46336 | elapsed time per iteration (ms): 13614.8 | learning rate: 1.283E-05 | global batch size: 16 | lm loss: 6.839938E+00 | loss scale: 16384.0 | grad norm: 146916.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2897/ 159576 | consumed samples: 46352 | elapsed time per iteration (ms): 13601.5 | learning rate: 1.284E-05 | global batch size: 16 | lm loss: 6.725083E+00 | loss scale: 16384.0 | grad norm: 101921.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2898/ 159576 | consumed samples: 46368 | elapsed time per iteration (ms): 13584.0 | learning rate: 1.284E-05 | global batch size: 16 | lm loss: 7.088351E+00 | loss scale: 16384.0 | grad norm: 78883.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2899/ 159576 | consumed samples: 46384 | elapsed time per iteration (ms): 14019.6 | learning rate: 1.285E-05 | global batch size: 16 | lm loss: 6.874489E+00 | loss scale: 16384.0 | grad norm: 79406.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2900/ 159576 | consumed samples: 46400 | elapsed time per iteration (ms): 13571.5 | learning rate: 1.285E-05 | global batch size: 16 | lm loss: 6.735637E+00 | loss scale: 16384.0 | grad norm: 58170.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2901/ 159576 | consumed samples: 46416 | elapsed time per iteration (ms): 13559.8 | learning rate: 1.286E-05 | global batch size: 16 | lm loss: 6.789194E+00 | loss scale: 16384.0 | grad norm: 153130.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2902/ 159576 | consumed samples: 46432 | elapsed time per iteration (ms): 13570.5 | learning rate: 1.286E-05 | global batch size: 16 | lm loss: 6.734316E+00 | loss scale: 16384.0 | grad norm: 116070.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2903/ 159576 | consumed samples: 46448 | elapsed time per iteration (ms): 13629.7 | learning rate: 1.287E-05 | global batch size: 16 | lm loss: 6.743185E+00 | loss scale: 16384.0 | grad norm: 76970.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2904/ 159576 | consumed samples: 46464 | elapsed time per iteration (ms): 13980.9 | learning rate: 1.287E-05 | global batch size: 16 | lm loss: 6.742231E+00 | loss scale: 16384.0 | grad norm: 79904.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2905/ 159576 | consumed samples: 46480 | elapsed time per iteration (ms): 13647.6 | learning rate: 1.287E-05 | global batch size: 16 | lm loss: 6.785865E+00 | loss scale: 16384.0 | grad norm: 66541.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2906/ 159576 | consumed samples: 46496 | elapsed time per iteration (ms): 13586.1 | learning rate: 1.288E-05 | global batch size: 16 | lm loss: 6.669911E+00 | loss scale: 16384.0 | grad norm: 76560.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2907/ 159576 | consumed samples: 46512 | elapsed time per iteration (ms): 13521.3 | learning rate: 1.288E-05 | global batch size: 16 | lm loss: 6.723244E+00 | loss scale: 16384.0 | grad norm: 103466.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2908/ 159576 | consumed samples: 46528 | elapsed time per iteration (ms): 13824.4 | learning rate: 1.289E-05 | global batch size: 16 | lm loss: 6.584032E+00 | loss scale: 16384.0 | grad norm: 73252.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2909/ 159576 | consumed samples: 46544 | elapsed time per iteration (ms): 13578.9 | learning rate: 1.289E-05 | global batch size: 16 | lm loss: 6.804316E+00 | loss scale: 16384.0 | grad norm: 70073.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2910/ 159576 | consumed samples: 46560 | elapsed time per iteration (ms): 13556.4 | learning rate: 1.290E-05 | global batch size: 16 | lm loss: 6.673604E+00 | loss scale: 16384.0 | grad norm: 109090.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2911/ 159576 | consumed samples: 46576 | elapsed time per iteration (ms): 13604.0 | learning rate: 1.290E-05 | global batch size: 16 | lm loss: 6.599095E+00 | loss scale: 16384.0 | grad norm: 57781.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2912/ 159576 | consumed samples: 46592 | elapsed time per iteration (ms): 13587.1 | learning rate: 1.291E-05 | global batch size: 16 | lm loss: 6.753370E+00 | loss scale: 16384.0 | grad norm: 76832.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2913/ 159576 | consumed samples: 46608 | elapsed time per iteration (ms): 13861.5 | learning rate: 1.291E-05 | global batch size: 16 | lm loss: 6.854298E+00 | loss scale: 16384.0 | grad norm: 72132.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2914/ 159576 | consumed samples: 46624 | elapsed time per iteration (ms): 13559.0 | learning rate: 1.291E-05 | global batch size: 16 | lm loss: 6.579864E+00 | loss scale: 16384.0 | grad norm: 74308.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2915/ 159576 | consumed samples: 46640 | elapsed time per iteration (ms): 13594.5 | learning rate: 1.292E-05 | global batch size: 16 | lm loss: 6.756865E+00 | loss scale: 16384.0 | grad norm: 54456.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2916/ 159576 | consumed samples: 46656 | elapsed time per iteration (ms): 13569.5 | learning rate: 1.292E-05 | global batch size: 16 | lm loss: 6.743901E+00 | loss scale: 16384.0 | grad norm: 55395.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2917/ 159576 | consumed samples: 46672 | elapsed time per iteration (ms): 13964.6 | learning rate: 1.293E-05 | global batch size: 16 | lm loss: 6.671132E+00 | loss scale: 16384.0 | grad norm: 82925.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2918/ 159576 | consumed samples: 46688 | elapsed time per iteration (ms): 13641.5 | learning rate: 1.293E-05 | global batch size: 16 | lm loss: 6.554927E+00 | loss scale: 16384.0 | grad norm: 64164.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2919/ 159576 | consumed samples: 46704 | elapsed time per iteration (ms): 13635.2 | learning rate: 1.294E-05 | global batch size: 16 | lm loss: 6.848719E+00 | loss scale: 16384.0 | grad norm: 67718.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2920/ 159576 | consumed samples: 46720 | elapsed time per iteration (ms): 13603.6 | learning rate: 1.294E-05 | global batch size: 16 | lm loss: 6.609835E+00 | loss scale: 16384.0 | grad norm: 64921.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2921/ 159576 | consumed samples: 46736 | elapsed time per iteration (ms): 13865.5 | learning rate: 1.295E-05 | global batch size: 16 | lm loss: 6.699195E+00 | loss scale: 16384.0 | grad norm: 76865.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2922/ 159576 | consumed samples: 46752 | elapsed time per iteration (ms): 13659.4 | learning rate: 1.295E-05 | global batch size: 16 | lm loss: 6.821632E+00 | loss scale: 16384.0 | grad norm: 105825.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2923/ 159576 | consumed samples: 46768 | elapsed time per iteration (ms): 13539.7 | learning rate: 1.295E-05 | global batch size: 16 | lm loss: 6.632296E+00 | loss scale: 16384.0 | grad norm: 85548.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2924/ 159576 | consumed samples: 46784 | elapsed time per iteration (ms): 13587.6 | learning rate: 1.296E-05 | global batch size: 16 | lm loss: 6.782111E+00 | loss scale: 16384.0 | grad norm: 64005.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2925/ 159576 | consumed samples: 46800 | elapsed time per iteration (ms): 13566.6 | learning rate: 1.296E-05 | global batch size: 16 | lm loss: 6.513734E+00 | loss scale: 16384.0 | grad norm: 74875.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2926/ 159576 | consumed samples: 46816 | elapsed time per iteration (ms): 13817.4 | learning rate: 1.297E-05 | global batch size: 16 | lm loss: 6.610899E+00 | loss scale: 16384.0 | grad norm: 69678.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2927/ 159576 | consumed samples: 46832 | elapsed time per iteration (ms): 13615.5 | learning rate: 1.297E-05 | global batch size: 16 | lm loss: 7.086233E+00 | loss scale: 16384.0 | grad norm: 70522.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2928/ 159576 | consumed samples: 46848 | elapsed time per iteration (ms): 13566.8 | learning rate: 1.298E-05 | global batch size: 16 | lm loss: 6.598146E+00 | loss scale: 16384.0 | grad norm: 103276.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2929/ 159576 | consumed samples: 46864 | elapsed time per iteration (ms): 13567.1 | learning rate: 1.298E-05 | global batch size: 16 | lm loss: 6.593244E+00 | loss scale: 16384.0 | grad norm: 78523.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2930/ 159576 | consumed samples: 46880 | elapsed time per iteration (ms): 13919.4 | learning rate: 1.299E-05 | global batch size: 16 | lm loss: 6.528622E+00 | loss scale: 16384.0 | grad norm: 82737.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2931/ 159576 | consumed samples: 46896 | elapsed time per iteration (ms): 13557.6 | learning rate: 1.299E-05 | global batch size: 16 | lm loss: 6.605000E+00 | loss scale: 16384.0 | grad norm: 68077.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2932/ 159576 | consumed samples: 46912 | elapsed time per iteration (ms): 13570.1 | learning rate: 1.299E-05 | global batch size: 16 | lm loss: 6.595417E+00 | loss scale: 16384.0 | grad norm: 84602.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2933/ 159576 | consumed samples: 46928 | elapsed time per iteration (ms): 13606.8 | learning rate: 1.300E-05 | global batch size: 16 | lm loss: 6.730010E+00 | loss scale: 16384.0 | grad norm: 85745.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2934/ 159576 | consumed samples: 46944 | elapsed time per iteration (ms): 13584.8 | learning rate: 1.300E-05 | global batch size: 16 | lm loss: 6.689770E+00 | loss scale: 16384.0 | grad norm: 62655.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2935/ 159576 | consumed samples: 46960 | elapsed time per iteration (ms): 14053.4 | learning rate: 1.301E-05 | global batch size: 16 | lm loss: 6.715128E+00 | loss scale: 16384.0 | grad norm: 65695.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2936/ 159576 | consumed samples: 46976 | elapsed time per iteration (ms): 13589.9 | learning rate: 1.301E-05 | global batch size: 16 | lm loss: 6.651369E+00 | loss scale: 16384.0 | grad norm: 55322.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2937/ 159576 | consumed samples: 46992 | elapsed time per iteration (ms): 13553.6 | learning rate: 1.302E-05 | global batch size: 16 | lm loss: 6.646598E+00 | loss scale: 16384.0 | grad norm: 105686.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2938/ 159576 | consumed samples: 47008 | elapsed time per iteration (ms): 13584.5 | learning rate: 1.302E-05 | global batch size: 16 | lm loss: 6.798124E+00 | loss scale: 16384.0 | grad norm: 62478.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2939/ 159576 | consumed samples: 47024 | elapsed time per iteration (ms): 13902.5 | learning rate: 1.303E-05 | global batch size: 16 | lm loss: 6.594469E+00 | loss scale: 16384.0 | grad norm: 66128.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2940/ 159576 | consumed samples: 47040 | elapsed time per iteration (ms): 13632.4 | learning rate: 1.303E-05 | global batch size: 16 | lm loss: 6.642596E+00 | loss scale: 16384.0 | grad norm: 70291.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2941/ 159576 | consumed samples: 47056 | elapsed time per iteration (ms): 13595.9 | learning rate: 1.303E-05 | global batch size: 16 | lm loss: 6.428228E+00 | loss scale: 16384.0 | grad norm: 88273.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2942/ 159576 | consumed samples: 47072 | elapsed time per iteration (ms): 13622.0 | learning rate: 1.304E-05 | global batch size: 16 | lm loss: 6.776118E+00 | loss scale: 16384.0 | grad norm: 66140.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2943/ 159576 | consumed samples: 47088 | elapsed time per iteration (ms): 13949.2 | learning rate: 1.304E-05 | global batch size: 16 | lm loss: 6.678353E+00 | loss scale: 16384.0 | grad norm: 68411.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2944/ 159576 | consumed samples: 47104 | elapsed time per iteration (ms): 13581.2 | learning rate: 1.305E-05 | global batch size: 16 | lm loss: 6.679141E+00 | loss scale: 16384.0 | grad norm: 85622.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2945/ 159576 | consumed samples: 47120 | elapsed time per iteration (ms): 13544.3 | learning rate: 1.305E-05 | global batch size: 16 | lm loss: 6.620451E+00 | loss scale: 16384.0 | grad norm: 62226.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2946/ 159576 | consumed samples: 47136 | elapsed time per iteration (ms): 13593.9 | learning rate: 1.306E-05 | global batch size: 16 | lm loss: 6.719603E+00 | loss scale: 16384.0 | grad norm: 90885.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2947/ 159576 | consumed samples: 47152 | elapsed time per iteration (ms): 13604.3 | learning rate: 1.306E-05 | global batch size: 16 | lm loss: 6.704114E+00 | loss scale: 16384.0 | grad norm: 67182.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2948/ 159576 | consumed samples: 47168 | elapsed time per iteration (ms): 13746.5 | learning rate: 1.307E-05 | global batch size: 16 | lm loss: 6.781267E+00 | loss scale: 16384.0 | grad norm: 85616.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2949/ 159576 | consumed samples: 47184 | elapsed time per iteration (ms): 13612.1 | learning rate: 1.307E-05 | global batch size: 16 | lm loss: 6.878286E+00 | loss scale: 16384.0 | grad norm: 83807.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2950/ 159576 | consumed samples: 47200 | elapsed time per iteration (ms): 13656.8 | learning rate: 1.307E-05 | global batch size: 16 | lm loss: 6.808831E+00 | loss scale: 16384.0 | grad norm: 99669.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2951/ 159576 | consumed samples: 47216 | elapsed time per iteration (ms): 13662.4 | learning rate: 1.308E-05 | global batch size: 16 | lm loss: 6.751644E+00 | loss scale: 16384.0 | grad norm: 60477.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2952/ 159576 | consumed samples: 47232 | elapsed time per iteration (ms): 13999.0 | learning rate: 1.308E-05 | global batch size: 16 | lm loss: 6.593210E+00 | loss scale: 16384.0 | grad norm: 72293.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2953/ 159576 | consumed samples: 47248 | elapsed time per iteration (ms): 13609.1 | learning rate: 1.309E-05 | global batch size: 16 | lm loss: 6.662547E+00 | loss scale: 16384.0 | grad norm: 49910.061 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2954/ 159576 | consumed samples: 47280 | elapsed time per iteration (ms): 14635.0 | learning rate: 1.310E-05 | global batch size: 32 | lm loss: 6.688079E+00 | loss scale: 16384.0 | grad norm: 111598.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2955/ 159576 | consumed samples: 47312 | elapsed time per iteration (ms): 14591.8 | learning rate: 1.311E-05 | global batch size: 32 | lm loss: 6.657289E+00 | loss scale: 16384.0 | grad norm: 67597.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2956/ 159576 | consumed samples: 47344 | elapsed time per iteration (ms): 15030.0 | learning rate: 1.311E-05 | global batch size: 32 | lm loss: 6.554570E+00 | loss scale: 16384.0 | grad norm: 69780.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2957/ 159576 | consumed samples: 47376 | elapsed time per iteration (ms): 14563.7 | learning rate: 1.312E-05 | global batch size: 32 | lm loss: 6.741304E+00 | loss scale: 16384.0 | grad norm: 58633.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2958/ 159576 | consumed samples: 47408 | elapsed time per iteration (ms): 14589.9 | learning rate: 1.313E-05 | global batch size: 32 | lm loss: 6.601515E+00 | loss scale: 16384.0 | grad norm: 107295.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2959/ 159576 | consumed samples: 47440 | elapsed time per iteration (ms): 14625.1 | learning rate: 1.314E-05 | global batch size: 32 | lm loss: 6.683945E+00 | loss scale: 16384.0 | grad norm: 81347.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2960/ 159576 | consumed samples: 47472 | elapsed time per iteration (ms): 14964.2 | learning rate: 1.315E-05 | global batch size: 32 | lm loss: 6.790781E+00 | loss scale: 16384.0 | grad norm: 77191.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2961/ 159576 | consumed samples: 47504 | elapsed time per iteration (ms): 14557.0 | learning rate: 1.316E-05 | global batch size: 32 | lm loss: 6.749201E+00 | loss scale: 16384.0 | grad norm: 82408.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2962/ 159576 | consumed samples: 47536 | elapsed time per iteration (ms): 14666.5 | learning rate: 1.317E-05 | global batch size: 32 | lm loss: 6.532114E+00 | loss scale: 16384.0 | grad norm: 51870.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2963/ 159576 | consumed samples: 47568 | elapsed time per iteration (ms): 14537.9 | learning rate: 1.318E-05 | global batch size: 32 | lm loss: 6.660976E+00 | loss scale: 16384.0 | grad norm: 66392.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2964/ 159576 | consumed samples: 47600 | elapsed time per iteration (ms): 15078.8 | learning rate: 1.318E-05 | global batch size: 32 | lm loss: 6.526144E+00 | loss scale: 16384.0 | grad norm: 54716.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2965/ 159576 | consumed samples: 47632 | elapsed time per iteration (ms): 14737.9 | learning rate: 1.319E-05 | global batch size: 32 | lm loss: 6.649373E+00 | loss scale: 16384.0 | grad norm: 51359.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2966/ 159576 | consumed samples: 47664 | elapsed time per iteration (ms): 14559.9 | learning rate: 1.320E-05 | global batch size: 32 | lm loss: 6.672748E+00 | loss scale: 16384.0 | grad norm: 73789.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2967/ 159576 | consumed samples: 47696 | elapsed time per iteration (ms): 14642.3 | learning rate: 1.321E-05 | global batch size: 32 | lm loss: 6.662704E+00 | loss scale: 16384.0 | grad norm: 66303.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2968/ 159576 | consumed samples: 47728 | elapsed time per iteration (ms): 14852.7 | learning rate: 1.322E-05 | global batch size: 32 | lm loss: 6.624488E+00 | loss scale: 16384.0 | grad norm: 59052.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2969/ 159576 | consumed samples: 47760 | elapsed time per iteration (ms): 14836.6 | learning rate: 1.323E-05 | global batch size: 32 | lm loss: 6.600084E+00 | loss scale: 16384.0 | grad norm: 62547.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2970/ 159576 | consumed samples: 47792 | elapsed time per iteration (ms): 14593.7 | learning rate: 1.324E-05 | global batch size: 32 | lm loss: 6.517389E+00 | loss scale: 16384.0 | grad norm: 60694.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2971/ 159576 | consumed samples: 47824 | elapsed time per iteration (ms): 14618.4 | learning rate: 1.325E-05 | global batch size: 32 | lm loss: 6.548014E+00 | loss scale: 16384.0 | grad norm: 43913.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2972/ 159576 | consumed samples: 47856 | elapsed time per iteration (ms): 14695.6 | learning rate: 1.326E-05 | global batch size: 32 | lm loss: 6.593935E+00 | loss scale: 16384.0 | grad norm: 63488.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2973/ 159576 | consumed samples: 47888 | elapsed time per iteration (ms): 14827.1 | learning rate: 1.326E-05 | global batch size: 32 | lm loss: 6.572222E+00 | loss scale: 16384.0 | grad norm: 54368.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2974/ 159576 | consumed samples: 47920 | elapsed time per iteration (ms): 14620.6 | learning rate: 1.327E-05 | global batch size: 32 | lm loss: 6.550548E+00 | loss scale: 16384.0 | grad norm: 87940.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2975/ 159576 | consumed samples: 47952 | elapsed time per iteration (ms): 14622.4 | learning rate: 1.328E-05 | global batch size: 32 | lm loss: 6.529421E+00 | loss scale: 16384.0 | grad norm: 60145.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2976/ 159576 | consumed samples: 47984 | elapsed time per iteration (ms): 14586.4 | learning rate: 1.329E-05 | global batch size: 32 | lm loss: 6.765855E+00 | loss scale: 16384.0 | grad norm: 83899.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2977/ 159576 | consumed samples: 48016 | elapsed time per iteration (ms): 14810.9 | learning rate: 1.330E-05 | global batch size: 32 | lm loss: 6.630699E+00 | loss scale: 16384.0 | grad norm: 44149.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2978/ 159576 | consumed samples: 48048 | elapsed time per iteration (ms): 14685.4 | learning rate: 1.331E-05 | global batch size: 32 | lm loss: 6.561995E+00 | loss scale: 16384.0 | grad norm: 87446.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2979/ 159576 | consumed samples: 48080 | elapsed time per iteration (ms): 14648.9 | learning rate: 1.332E-05 | global batch size: 32 | lm loss: 6.467924E+00 | loss scale: 16384.0 | grad norm: 65034.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2980/ 159576 | consumed samples: 48112 | elapsed time per iteration (ms): 14615.3 | learning rate: 1.333E-05 | global batch size: 32 | lm loss: 6.649030E+00 | loss scale: 16384.0 | grad norm: 92148.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2981/ 159576 | consumed samples: 48144 | elapsed time per iteration (ms): 14681.7 | learning rate: 1.334E-05 | global batch size: 32 | lm loss: 6.749784E+00 | loss scale: 16384.0 | grad norm: 61670.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2982/ 159576 | consumed samples: 48176 | elapsed time per iteration (ms): 14509.6 | learning rate: 1.334E-05 | global batch size: 32 | lm loss: 6.567672E+00 | loss scale: 16384.0 | grad norm: 79628.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2983/ 159576 | consumed samples: 48208 | elapsed time per iteration (ms): 14555.2 | learning rate: 1.335E-05 | global batch size: 32 | lm loss: 6.676024E+00 | loss scale: 16384.0 | grad norm: 65136.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2984/ 159576 | consumed samples: 48240 | elapsed time per iteration (ms): 14572.2 | learning rate: 1.336E-05 | global batch size: 32 | lm loss: 6.467518E+00 | loss scale: 16384.0 | grad norm: 90637.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2985/ 159576 | consumed samples: 48272 | elapsed time per iteration (ms): 14888.7 | learning rate: 1.337E-05 | global batch size: 32 | lm loss: 6.586103E+00 | loss scale: 16384.0 | grad norm: 81306.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2986/ 159576 | consumed samples: 48304 | elapsed time per iteration (ms): 14588.0 | learning rate: 1.338E-05 | global batch size: 32 | lm loss: 6.541125E+00 | loss scale: 16384.0 | grad norm: 62368.768 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2987/ 159576 | consumed samples: 48336 | elapsed time per iteration (ms): 14597.9 | learning rate: 1.339E-05 | global batch size: 32 | lm loss: 6.591407E+00 | loss scale: 16384.0 | grad norm: 87504.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2988/ 159576 | consumed samples: 48368 | elapsed time per iteration (ms): 14590.3 | learning rate: 1.340E-05 | global batch size: 32 | lm loss: 6.678365E+00 | loss scale: 16384.0 | grad norm: 78293.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2989/ 159576 | consumed samples: 48400 | elapsed time per iteration (ms): 15031.9 | learning rate: 1.341E-05 | global batch size: 32 | lm loss: 6.564939E+00 | loss scale: 16384.0 | grad norm: 77173.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2990/ 159576 | consumed samples: 48432 | elapsed time per iteration (ms): 14705.4 | learning rate: 1.342E-05 | global batch size: 32 | lm loss: 6.692814E+00 | loss scale: 16384.0 | grad norm: 57544.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2991/ 159576 | consumed samples: 48464 | elapsed time per iteration (ms): 14586.3 | learning rate: 1.342E-05 | global batch size: 32 | lm loss: 6.628499E+00 | loss scale: 16384.0 | grad norm: 75164.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2992/ 159576 | consumed samples: 48496 | elapsed time per iteration (ms): 14624.5 | learning rate: 1.343E-05 | global batch size: 32 | lm loss: 6.582328E+00 | loss scale: 16384.0 | grad norm: 79666.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2993/ 159576 | consumed samples: 48528 | elapsed time per iteration (ms): 14950.3 | learning rate: 1.344E-05 | global batch size: 32 | lm loss: 6.558386E+00 | loss scale: 16384.0 | grad norm: 55234.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2994/ 159576 | consumed samples: 48560 | elapsed time per iteration (ms): 14695.8 | learning rate: 1.345E-05 | global batch size: 32 | lm loss: 6.676173E+00 | loss scale: 16384.0 | grad norm: 99524.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2995/ 159576 | consumed samples: 48592 | elapsed time per iteration (ms): 14559.9 | learning rate: 1.346E-05 | global batch size: 32 | lm loss: 6.529976E+00 | loss scale: 16384.0 | grad norm: 70438.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2996/ 159576 | consumed samples: 48624 | elapsed time per iteration (ms): 14633.0 | learning rate: 1.347E-05 | global batch size: 32 | lm loss: 6.703721E+00 | loss scale: 16384.0 | grad norm: 67881.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2997/ 159576 | consumed samples: 48656 | elapsed time per iteration (ms): 14953.9 | learning rate: 1.348E-05 | global batch size: 32 | lm loss: 6.681076E+00 | loss scale: 16384.0 | grad norm: 65351.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2998/ 159576 | consumed samples: 48688 | elapsed time per iteration (ms): 14564.8 | learning rate: 1.349E-05 | global batch size: 32 | lm loss: 6.741710E+00 | loss scale: 16384.0 | grad norm: 80607.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2999/ 159576 | consumed samples: 48720 | elapsed time per iteration (ms): 14589.6 | learning rate: 1.350E-05 | global batch size: 32 | lm loss: 6.628071E+00 | loss scale: 16384.0 | grad norm: 80325.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3000/ 159576 | consumed samples: 48752 | elapsed time per iteration (ms): 14580.8 | learning rate: 1.350E-05 | global batch size: 32 | lm loss: 6.770764E+00 | loss scale: 16384.0 | grad norm: 62881.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 3000 | lm loss value: 6.650529E+00 | lm loss PPL: 7.731932E+02 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 3000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-24 13:45:23,476] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step3000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 3000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 19970.36 iteration 3001/ 159576 | consumed samples: 48784 | elapsed time per iteration (ms): 40923.5 | learning rate: 1.351E-05 | global batch size: 32 | lm loss: 6.512466E+00 | loss scale: 16384.0 | grad norm: 78526.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3002/ 159576 | consumed samples: 48816 | elapsed time per iteration (ms): 14454.5 | learning rate: 1.352E-05 | global batch size: 32 | lm loss: 6.725769E+00 | loss scale: 16384.0 | grad norm: 52532.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3003/ 159576 | consumed samples: 48848 | elapsed time per iteration (ms): 14508.9 | learning rate: 1.353E-05 | global batch size: 32 | lm loss: 6.778904E+00 | loss scale: 16384.0 | grad norm: 61815.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3004/ 159576 | consumed samples: 48880 | elapsed time per iteration (ms): 14774.8 | learning rate: 1.354E-05 | global batch size: 32 | lm loss: 6.600959E+00 | loss scale: 16384.0 | grad norm: 72563.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3005/ 159576 | consumed samples: 48912 | elapsed time per iteration (ms): 14543.7 | learning rate: 1.355E-05 | global batch size: 32 | lm loss: 6.630536E+00 | loss scale: 16384.0 | grad norm: 52120.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3006/ 159576 | consumed samples: 48944 | elapsed time per iteration (ms): 14501.8 | learning rate: 1.356E-05 | global batch size: 32 | lm loss: 6.661976E+00 | loss scale: 16384.0 | grad norm: 60799.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3007/ 159576 | consumed samples: 48976 | elapsed time per iteration (ms): 14465.0 | learning rate: 1.357E-05 | global batch size: 32 | lm loss: 6.695879E+00 | loss scale: 16384.0 | grad norm: 55470.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3008/ 159576 | consumed samples: 49008 | elapsed time per iteration (ms): 14696.5 | learning rate: 1.358E-05 | global batch size: 32 | lm loss: 6.613426E+00 | loss scale: 16384.0 | grad norm: 80502.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3009/ 159576 | consumed samples: 49040 | elapsed time per iteration (ms): 14441.9 | learning rate: 1.358E-05 | global batch size: 32 | lm loss: 6.640174E+00 | loss scale: 16384.0 | grad norm: 53100.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3010/ 159576 | consumed samples: 49072 | elapsed time per iteration (ms): 14484.3 | learning rate: 1.359E-05 | global batch size: 32 | lm loss: 6.660203E+00 | loss scale: 16384.0 | grad norm: 69573.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3011/ 159576 | consumed samples: 49104 | elapsed time per iteration (ms): 14599.1 | learning rate: 1.360E-05 | global batch size: 32 | lm loss: 6.674448E+00 | loss scale: 16384.0 | grad norm: 49737.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3012/ 159576 | consumed samples: 49136 | elapsed time per iteration (ms): 14701.4 | learning rate: 1.361E-05 | global batch size: 32 | lm loss: 6.607582E+00 | loss scale: 16384.0 | grad norm: 121923.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3013/ 159576 | consumed samples: 49168 | elapsed time per iteration (ms): 14527.2 | learning rate: 1.362E-05 | global batch size: 32 | lm loss: 6.552118E+00 | loss scale: 16384.0 | grad norm: 86117.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3014/ 159576 | consumed samples: 49200 | elapsed time per iteration (ms): 14528.7 | learning rate: 1.363E-05 | global batch size: 32 | lm loss: 6.628557E+00 | loss scale: 16384.0 | grad norm: 65341.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3015/ 159576 | consumed samples: 49232 | elapsed time per iteration (ms): 14528.2 | learning rate: 1.364E-05 | global batch size: 32 | lm loss: 6.637073E+00 | loss scale: 16384.0 | grad norm: 56388.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3016/ 159576 | consumed samples: 49264 | elapsed time per iteration (ms): 14818.6 | learning rate: 1.365E-05 | global batch size: 32 | lm loss: 6.643037E+00 | loss scale: 16384.0 | grad norm: 92476.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3017/ 159576 | consumed samples: 49296 | elapsed time per iteration (ms): 14532.4 | learning rate: 1.366E-05 | global batch size: 32 | lm loss: 6.517512E+00 | loss scale: 16384.0 | grad norm: 69528.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3018/ 159576 | consumed samples: 49328 | elapsed time per iteration (ms): 14482.9 | learning rate: 1.366E-05 | global batch size: 32 | lm loss: 6.593336E+00 | loss scale: 16384.0 | grad norm: 58227.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3019/ 159576 | consumed samples: 49360 | elapsed time per iteration (ms): 14483.3 | learning rate: 1.367E-05 | global batch size: 32 | lm loss: 6.682046E+00 | loss scale: 16384.0 | grad norm: 77807.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3020/ 159576 | consumed samples: 49392 | elapsed time per iteration (ms): 15039.4 | learning rate: 1.368E-05 | global batch size: 32 | lm loss: 6.511760E+00 | loss scale: 16384.0 | grad norm: 61711.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3021/ 159576 | consumed samples: 49424 | elapsed time per iteration (ms): 14532.3 | learning rate: 1.369E-05 | global batch size: 32 | lm loss: 6.601027E+00 | loss scale: 16384.0 | grad norm: 59045.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3022/ 159576 | consumed samples: 49456 | elapsed time per iteration (ms): 14411.9 | learning rate: 1.370E-05 | global batch size: 32 | lm loss: 6.669757E+00 | loss scale: 16384.0 | grad norm: 79072.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3023/ 159576 | consumed samples: 49488 | elapsed time per iteration (ms): 14433.5 | learning rate: 1.371E-05 | global batch size: 32 | lm loss: 6.660283E+00 | loss scale: 16384.0 | grad norm: 83581.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3024/ 159576 | consumed samples: 49520 | elapsed time per iteration (ms): 14915.2 | learning rate: 1.372E-05 | global batch size: 32 | lm loss: 6.621551E+00 | loss scale: 16384.0 | grad norm: 64854.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3025/ 159576 | consumed samples: 49552 | elapsed time per iteration (ms): 14425.9 | learning rate: 1.373E-05 | global batch size: 32 | lm loss: 6.591113E+00 | loss scale: 16384.0 | grad norm: 52620.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3026/ 159576 | consumed samples: 49584 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.374E-05 | global batch size: 32 | lm loss: 6.659728E+00 | loss scale: 16384.0 | grad norm: 50471.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3027/ 159576 | consumed samples: 49616 | elapsed time per iteration (ms): 14493.7 | learning rate: 1.374E-05 | global batch size: 32 | lm loss: 6.786015E+00 | loss scale: 16384.0 | grad norm: 89599.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3028/ 159576 | consumed samples: 49648 | elapsed time per iteration (ms): 14955.9 | learning rate: 1.375E-05 | global batch size: 32 | lm loss: 6.515626E+00 | loss scale: 16384.0 | grad norm: 71757.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3029/ 159576 | consumed samples: 49680 | elapsed time per iteration (ms): 14451.8 | learning rate: 1.376E-05 | global batch size: 32 | lm loss: 6.552487E+00 | loss scale: 16384.0 | grad norm: 59493.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3030/ 159576 | consumed samples: 49712 | elapsed time per iteration (ms): 14565.2 | learning rate: 1.377E-05 | global batch size: 32 | lm loss: 6.515723E+00 | loss scale: 16384.0 | grad norm: 70621.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3031/ 159576 | consumed samples: 49744 | elapsed time per iteration (ms): 14573.9 | learning rate: 1.378E-05 | global batch size: 32 | lm loss: 6.533678E+00 | loss scale: 16384.0 | grad norm: 67416.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3032/ 159576 | consumed samples: 49776 | elapsed time per iteration (ms): 14838.7 | learning rate: 1.379E-05 | global batch size: 32 | lm loss: 6.558086E+00 | loss scale: 16384.0 | grad norm: 57733.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3033/ 159576 | consumed samples: 49808 | elapsed time per iteration (ms): 14602.8 | learning rate: 1.380E-05 | global batch size: 32 | lm loss: 6.520467E+00 | loss scale: 16384.0 | grad norm: 82103.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3034/ 159576 | consumed samples: 49840 | elapsed time per iteration (ms): 14562.2 | learning rate: 1.381E-05 | global batch size: 32 | lm loss: 6.583010E+00 | loss scale: 16384.0 | grad norm: 49461.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3035/ 159576 | consumed samples: 49872 | elapsed time per iteration (ms): 14551.2 | learning rate: 1.382E-05 | global batch size: 32 | lm loss: 6.614191E+00 | loss scale: 16384.0 | grad norm: 42934.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3036/ 159576 | consumed samples: 49904 | elapsed time per iteration (ms): 15033.1 | learning rate: 1.382E-05 | global batch size: 32 | lm loss: 6.646058E+00 | loss scale: 16384.0 | grad norm: 72475.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3037/ 159576 | consumed samples: 49936 | elapsed time per iteration (ms): 14506.7 | learning rate: 1.383E-05 | global batch size: 32 | lm loss: 6.657450E+00 | loss scale: 16384.0 | grad norm: 51862.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3038/ 159576 | consumed samples: 49968 | elapsed time per iteration (ms): 14535.4 | learning rate: 1.384E-05 | global batch size: 32 | lm loss: 6.474831E+00 | loss scale: 16384.0 | grad norm: 54826.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3039/ 159576 | consumed samples: 50000 | elapsed time per iteration (ms): 14517.2 | learning rate: 1.385E-05 | global batch size: 32 | lm loss: 6.491888E+00 | loss scale: 16384.0 | grad norm: 48045.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3040/ 159576 | consumed samples: 50032 | elapsed time per iteration (ms): 14679.0 | learning rate: 1.386E-05 | global batch size: 32 | lm loss: 6.557182E+00 | loss scale: 16384.0 | grad norm: 79148.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3041/ 159576 | consumed samples: 50064 | elapsed time per iteration (ms): 14829.2 | learning rate: 1.387E-05 | global batch size: 32 | lm loss: 6.624621E+00 | loss scale: 16384.0 | grad norm: 50930.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3042/ 159576 | consumed samples: 50096 | elapsed time per iteration (ms): 14560.9 | learning rate: 1.388E-05 | global batch size: 32 | lm loss: 6.572658E+00 | loss scale: 16384.0 | grad norm: 72539.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3043/ 159576 | consumed samples: 50128 | elapsed time per iteration (ms): 14616.0 | learning rate: 1.389E-05 | global batch size: 32 | lm loss: 6.654581E+00 | loss scale: 16384.0 | grad norm: 66089.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3044/ 159576 | consumed samples: 50160 | elapsed time per iteration (ms): 14597.6 | learning rate: 1.389E-05 | global batch size: 32 | lm loss: 6.568760E+00 | loss scale: 16384.0 | grad norm: 77389.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3045/ 159576 | consumed samples: 50192 | elapsed time per iteration (ms): 14717.8 | learning rate: 1.390E-05 | global batch size: 32 | lm loss: 6.562954E+00 | loss scale: 16384.0 | grad norm: 59175.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3046/ 159576 | consumed samples: 50224 | elapsed time per iteration (ms): 14549.8 | learning rate: 1.391E-05 | global batch size: 32 | lm loss: 6.519083E+00 | loss scale: 16384.0 | grad norm: 72573.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3047/ 159576 | consumed samples: 50256 | elapsed time per iteration (ms): 14547.8 | learning rate: 1.392E-05 | global batch size: 32 | lm loss: 6.586189E+00 | loss scale: 16384.0 | grad norm: 63454.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3048/ 159576 | consumed samples: 50288 | elapsed time per iteration (ms): 14699.8 | learning rate: 1.393E-05 | global batch size: 32 | lm loss: 6.629214E+00 | loss scale: 16384.0 | grad norm: 49137.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3049/ 159576 | consumed samples: 50320 | elapsed time per iteration (ms): 14760.5 | learning rate: 1.394E-05 | global batch size: 32 | lm loss: 6.567476E+00 | loss scale: 16384.0 | grad norm: 59423.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3050/ 159576 | consumed samples: 50352 | elapsed time per iteration (ms): 14605.2 | learning rate: 1.395E-05 | global batch size: 32 | lm loss: 6.560441E+00 | loss scale: 16384.0 | grad norm: 76106.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3051/ 159576 | consumed samples: 50384 | elapsed time per iteration (ms): 14589.0 | learning rate: 1.396E-05 | global batch size: 32 | lm loss: 6.676329E+00 | loss scale: 16384.0 | grad norm: 43490.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3052/ 159576 | consumed samples: 50416 | elapsed time per iteration (ms): 14546.5 | learning rate: 1.397E-05 | global batch size: 32 | lm loss: 6.531154E+00 | loss scale: 16384.0 | grad norm: 77324.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3053/ 159576 | consumed samples: 50448 | elapsed time per iteration (ms): 14689.5 | learning rate: 1.397E-05 | global batch size: 32 | lm loss: 6.457368E+00 | loss scale: 16384.0 | grad norm: 61005.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3054/ 159576 | consumed samples: 50480 | elapsed time per iteration (ms): 14604.5 | learning rate: 1.398E-05 | global batch size: 32 | lm loss: 6.694659E+00 | loss scale: 16384.0 | grad norm: 50570.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3055/ 159576 | consumed samples: 50512 | elapsed time per iteration (ms): 14507.3 | learning rate: 1.399E-05 | global batch size: 32 | lm loss: 6.639795E+00 | loss scale: 16384.0 | grad norm: 57017.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3056/ 159576 | consumed samples: 50544 | elapsed time per iteration (ms): 14581.4 | learning rate: 1.400E-05 | global batch size: 32 | lm loss: 6.619573E+00 | loss scale: 16384.0 | grad norm: 60323.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3057/ 159576 | consumed samples: 50576 | elapsed time per iteration (ms): 15078.3 | learning rate: 1.401E-05 | global batch size: 32 | lm loss: 6.636419E+00 | loss scale: 16384.0 | grad norm: 49598.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3058/ 159576 | consumed samples: 50608 | elapsed time per iteration (ms): 14576.1 | learning rate: 1.402E-05 | global batch size: 32 | lm loss: 6.591126E+00 | loss scale: 16384.0 | grad norm: 102052.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3059/ 159576 | consumed samples: 50640 | elapsed time per iteration (ms): 14515.1 | learning rate: 1.403E-05 | global batch size: 32 | lm loss: 6.500241E+00 | loss scale: 16384.0 | grad norm: 52981.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3060/ 159576 | consumed samples: 50672 | elapsed time per iteration (ms): 14582.7 | learning rate: 1.404E-05 | global batch size: 32 | lm loss: 6.553960E+00 | loss scale: 16384.0 | grad norm: 57341.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3061/ 159576 | consumed samples: 50704 | elapsed time per iteration (ms): 14939.5 | learning rate: 1.405E-05 | global batch size: 32 | lm loss: 6.593186E+00 | loss scale: 16384.0 | grad norm: 50198.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3062/ 159576 | consumed samples: 50736 | elapsed time per iteration (ms): 14545.7 | learning rate: 1.405E-05 | global batch size: 32 | lm loss: 6.577888E+00 | loss scale: 16384.0 | grad norm: 90008.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3063/ 159576 | consumed samples: 50768 | elapsed time per iteration (ms): 14515.8 | learning rate: 1.406E-05 | global batch size: 32 | lm loss: 6.775355E+00 | loss scale: 16384.0 | grad norm: 52343.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3064/ 159576 | consumed samples: 50800 | elapsed time per iteration (ms): 14570.2 | learning rate: 1.407E-05 | global batch size: 32 | lm loss: 6.724249E+00 | loss scale: 16384.0 | grad norm: 69939.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3065/ 159576 | consumed samples: 50832 | elapsed time per iteration (ms): 14913.0 | learning rate: 1.408E-05 | global batch size: 32 | lm loss: 6.634195E+00 | loss scale: 16384.0 | grad norm: 70070.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3066/ 159576 | consumed samples: 50864 | elapsed time per iteration (ms): 14497.8 | learning rate: 1.409E-05 | global batch size: 32 | lm loss: 6.591150E+00 | loss scale: 16384.0 | grad norm: 80109.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3067/ 159576 | consumed samples: 50896 | elapsed time per iteration (ms): 14593.4 | learning rate: 1.410E-05 | global batch size: 32 | lm loss: 6.637640E+00 | loss scale: 16384.0 | grad norm: 51104.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3068/ 159576 | consumed samples: 50928 | elapsed time per iteration (ms): 14459.7 | learning rate: 1.411E-05 | global batch size: 32 | lm loss: 6.595787E+00 | loss scale: 16384.0 | grad norm: 49458.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3069/ 159576 | consumed samples: 50960 | elapsed time per iteration (ms): 14904.6 | learning rate: 1.412E-05 | global batch size: 32 | lm loss: 6.762650E+00 | loss scale: 16384.0 | grad norm: 88087.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3070/ 159576 | consumed samples: 50992 | elapsed time per iteration (ms): 14578.7 | learning rate: 1.413E-05 | global batch size: 32 | lm loss: 6.615232E+00 | loss scale: 16384.0 | grad norm: 50851.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3071/ 159576 | consumed samples: 51024 | elapsed time per iteration (ms): 14534.9 | learning rate: 1.413E-05 | global batch size: 32 | lm loss: 6.502337E+00 | loss scale: 16384.0 | grad norm: 82199.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3072/ 159576 | consumed samples: 51056 | elapsed time per iteration (ms): 14555.3 | learning rate: 1.414E-05 | global batch size: 32 | lm loss: 6.552182E+00 | loss scale: 16384.0 | grad norm: 67542.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3073/ 159576 | consumed samples: 51088 | elapsed time per iteration (ms): 15069.2 | learning rate: 1.415E-05 | global batch size: 32 | lm loss: 6.449011E+00 | loss scale: 16384.0 | grad norm: 113973.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3074/ 159576 | consumed samples: 51120 | elapsed time per iteration (ms): 14473.5 | learning rate: 1.416E-05 | global batch size: 32 | lm loss: 6.462796E+00 | loss scale: 16384.0 | grad norm: 99530.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3075/ 159576 | consumed samples: 51152 | elapsed time per iteration (ms): 14578.5 | learning rate: 1.417E-05 | global batch size: 32 | lm loss: 6.605415E+00 | loss scale: 16384.0 | grad norm: 79580.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3076/ 159576 | consumed samples: 51184 | elapsed time per iteration (ms): 14526.0 | learning rate: 1.418E-05 | global batch size: 32 | lm loss: 6.643724E+00 | loss scale: 16384.0 | grad norm: 83910.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3077/ 159576 | consumed samples: 51216 | elapsed time per iteration (ms): 14932.5 | learning rate: 1.419E-05 | global batch size: 32 | lm loss: 6.554170E+00 | loss scale: 16384.0 | grad norm: 41888.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3078/ 159576 | consumed samples: 51248 | elapsed time per iteration (ms): 14631.5 | learning rate: 1.420E-05 | global batch size: 32 | lm loss: 6.609428E+00 | loss scale: 16384.0 | grad norm: 100795.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3079/ 159576 | consumed samples: 51280 | elapsed time per iteration (ms): 14613.6 | learning rate: 1.421E-05 | global batch size: 32 | lm loss: 6.647438E+00 | loss scale: 16384.0 | grad norm: 79478.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3080/ 159576 | consumed samples: 51312 | elapsed time per iteration (ms): 14624.3 | learning rate: 1.421E-05 | global batch size: 32 | lm loss: 6.548526E+00 | loss scale: 16384.0 | grad norm: 61687.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3081/ 159576 | consumed samples: 51344 | elapsed time per iteration (ms): 14941.2 | learning rate: 1.422E-05 | global batch size: 32 | lm loss: 6.559642E+00 | loss scale: 16384.0 | grad norm: 51017.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3082/ 159576 | consumed samples: 51376 | elapsed time per iteration (ms): 14650.5 | learning rate: 1.423E-05 | global batch size: 32 | lm loss: 6.513590E+00 | loss scale: 16384.0 | grad norm: 62838.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3083/ 159576 | consumed samples: 51408 | elapsed time per iteration (ms): 14595.1 | learning rate: 1.424E-05 | global batch size: 32 | lm loss: 6.454400E+00 | loss scale: 16384.0 | grad norm: 85218.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3084/ 159576 | consumed samples: 51440 | elapsed time per iteration (ms): 14539.5 | learning rate: 1.425E-05 | global batch size: 32 | lm loss: 6.667971E+00 | loss scale: 16384.0 | grad norm: 74883.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3085/ 159576 | consumed samples: 51472 | elapsed time per iteration (ms): 14496.8 | learning rate: 1.426E-05 | global batch size: 32 | lm loss: 6.608503E+00 | loss scale: 16384.0 | grad norm: 64204.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3086/ 159576 | consumed samples: 51504 | elapsed time per iteration (ms): 14686.0 | learning rate: 1.427E-05 | global batch size: 32 | lm loss: 6.699879E+00 | loss scale: 16384.0 | grad norm: 42613.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 14:06:36] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 14:06:36] PULSE: tr8-104B is running for 8:14:25 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 3087/ 159576 | consumed samples: 51536 | elapsed time per iteration (ms): 14518.6 | learning rate: 1.428E-05 | global batch size: 32 | lm loss: 6.539448E+00 | loss scale: 16384.0 | grad norm: 88063.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3088/ 159576 | consumed samples: 51568 | elapsed time per iteration (ms): 14588.4 | learning rate: 1.429E-05 | global batch size: 32 | lm loss: 6.589184E+00 | loss scale: 16384.0 | grad norm: 54256.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3089/ 159576 | consumed samples: 51600 | elapsed time per iteration (ms): 14631.0 | learning rate: 1.429E-05 | global batch size: 32 | lm loss: 6.700484E+00 | loss scale: 16384.0 | grad norm: 54269.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3090/ 159576 | consumed samples: 51632 | elapsed time per iteration (ms): 14830.4 | learning rate: 1.430E-05 | global batch size: 32 | lm loss: 6.576167E+00 | loss scale: 16384.0 | grad norm: 57490.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3091/ 159576 | consumed samples: 51664 | elapsed time per iteration (ms): 14445.4 | learning rate: 1.431E-05 | global batch size: 32 | lm loss: 6.601985E+00 | loss scale: 16384.0 | grad norm: 57872.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3092/ 159576 | consumed samples: 51696 | elapsed time per iteration (ms): 14536.8 | learning rate: 1.432E-05 | global batch size: 32 | lm loss: 6.407238E+00 | loss scale: 16384.0 | grad norm: 52047.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3093/ 159576 | consumed samples: 51728 | elapsed time per iteration (ms): 14606.0 | learning rate: 1.433E-05 | global batch size: 32 | lm loss: 6.659007E+00 | loss scale: 16384.0 | grad norm: 76903.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3094/ 159576 | consumed samples: 51760 | elapsed time per iteration (ms): 14751.8 | learning rate: 1.434E-05 | global batch size: 32 | lm loss: 6.623207E+00 | loss scale: 16384.0 | grad norm: 98639.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3095/ 159576 | consumed samples: 51792 | elapsed time per iteration (ms): 14636.3 | learning rate: 1.435E-05 | global batch size: 32 | lm loss: 6.697064E+00 | loss scale: 16384.0 | grad norm: 59113.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3096/ 159576 | consumed samples: 51824 | elapsed time per iteration (ms): 14701.7 | learning rate: 1.436E-05 | global batch size: 32 | lm loss: 6.510694E+00 | loss scale: 16384.0 | grad norm: 57025.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3097/ 159576 | consumed samples: 51856 | elapsed time per iteration (ms): 14643.0 | learning rate: 1.437E-05 | global batch size: 32 | lm loss: 6.610021E+00 | loss scale: 16384.0 | grad norm: 90059.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3098/ 159576 | consumed samples: 51888 | elapsed time per iteration (ms): 14837.7 | learning rate: 1.437E-05 | global batch size: 32 | lm loss: 6.534551E+00 | loss scale: 16384.0 | grad norm: 45874.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3099/ 159576 | consumed samples: 51920 | elapsed time per iteration (ms): 14607.4 | learning rate: 1.438E-05 | global batch size: 32 | lm loss: 6.517954E+00 | loss scale: 16384.0 | grad norm: 60226.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3100/ 159576 | consumed samples: 51952 | elapsed time per iteration (ms): 14537.4 | learning rate: 1.439E-05 | global batch size: 32 | lm loss: 6.457252E+00 | loss scale: 16384.0 | grad norm: 46090.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3101/ 159576 | consumed samples: 51984 | elapsed time per iteration (ms): 14526.9 | learning rate: 1.440E-05 | global batch size: 32 | lm loss: 6.609892E+00 | loss scale: 16384.0 | grad norm: 94724.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3102/ 159576 | consumed samples: 52016 | elapsed time per iteration (ms): 14927.9 | learning rate: 1.441E-05 | global batch size: 32 | lm loss: 6.698421E+00 | loss scale: 16384.0 | grad norm: 87402.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3103/ 159576 | consumed samples: 52048 | elapsed time per iteration (ms): 14723.0 | learning rate: 1.442E-05 | global batch size: 32 | lm loss: 6.607485E+00 | loss scale: 16384.0 | grad norm: 53552.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3104/ 159576 | consumed samples: 52080 | elapsed time per iteration (ms): 14655.6 | learning rate: 1.443E-05 | global batch size: 32 | lm loss: 6.771776E+00 | loss scale: 16384.0 | grad norm: 77470.084 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3105/ 159576 | consumed samples: 52112 | elapsed time per iteration (ms): 14632.7 | learning rate: 1.444E-05 | global batch size: 32 | lm loss: 6.573309E+00 | loss scale: 16384.0 | grad norm: 60932.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3106/ 159576 | consumed samples: 52144 | elapsed time per iteration (ms): 15115.7 | learning rate: 1.445E-05 | global batch size: 32 | lm loss: 6.610741E+00 | loss scale: 16384.0 | grad norm: 67949.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3107/ 159576 | consumed samples: 52176 | elapsed time per iteration (ms): 14559.3 | learning rate: 1.445E-05 | global batch size: 32 | lm loss: 6.538753E+00 | loss scale: 16384.0 | grad norm: 71734.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3108/ 159576 | consumed samples: 52208 | elapsed time per iteration (ms): 14588.4 | learning rate: 1.446E-05 | global batch size: 32 | lm loss: 6.527990E+00 | loss scale: 16384.0 | grad norm: 86170.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3109/ 159576 | consumed samples: 52240 | elapsed time per iteration (ms): 14660.3 | learning rate: 1.447E-05 | global batch size: 32 | lm loss: 6.556553E+00 | loss scale: 16384.0 | grad norm: 46751.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3110/ 159576 | consumed samples: 52272 | elapsed time per iteration (ms): 15046.4 | learning rate: 1.448E-05 | global batch size: 32 | lm loss: 6.566851E+00 | loss scale: 16384.0 | grad norm: 67209.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3111/ 159576 | consumed samples: 52304 | elapsed time per iteration (ms): 14570.9 | learning rate: 1.449E-05 | global batch size: 32 | lm loss: 6.635989E+00 | loss scale: 16384.0 | grad norm: 53538.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3112/ 159576 | consumed samples: 52336 | elapsed time per iteration (ms): 14664.0 | learning rate: 1.450E-05 | global batch size: 32 | lm loss: 6.739109E+00 | loss scale: 16384.0 | grad norm: 100581.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3113/ 159576 | consumed samples: 52368 | elapsed time per iteration (ms): 14690.0 | learning rate: 1.451E-05 | global batch size: 32 | lm loss: 6.534431E+00 | loss scale: 16384.0 | grad norm: 69366.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3114/ 159576 | consumed samples: 52400 | elapsed time per iteration (ms): 14854.6 | learning rate: 1.452E-05 | global batch size: 32 | lm loss: 6.481595E+00 | loss scale: 16384.0 | grad norm: 57933.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3115/ 159576 | consumed samples: 52432 | elapsed time per iteration (ms): 14581.0 | learning rate: 1.453E-05 | global batch size: 32 | lm loss: 6.466241E+00 | loss scale: 16384.0 | grad norm: 91764.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3116/ 159576 | consumed samples: 52464 | elapsed time per iteration (ms): 14603.8 | learning rate: 1.453E-05 | global batch size: 32 | lm loss: 6.818060E+00 | loss scale: 16384.0 | grad norm: 73322.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3117/ 159576 | consumed samples: 52496 | elapsed time per iteration (ms): 14655.4 | learning rate: 1.454E-05 | global batch size: 32 | lm loss: 6.541664E+00 | loss scale: 16384.0 | grad norm: 79876.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3118/ 159576 | consumed samples: 52528 | elapsed time per iteration (ms): 15059.6 | learning rate: 1.455E-05 | global batch size: 32 | lm loss: 6.582567E+00 | loss scale: 16384.0 | grad norm: 57737.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3119/ 159576 | consumed samples: 52560 | elapsed time per iteration (ms): 14561.2 | learning rate: 1.456E-05 | global batch size: 32 | lm loss: 6.616435E+00 | loss scale: 16384.0 | grad norm: 75078.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3120/ 159576 | consumed samples: 52592 | elapsed time per iteration (ms): 14627.9 | learning rate: 1.457E-05 | global batch size: 32 | lm loss: 6.688129E+00 | loss scale: 16384.0 | grad norm: 51450.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3121/ 159576 | consumed samples: 52624 | elapsed time per iteration (ms): 14579.2 | learning rate: 1.458E-05 | global batch size: 32 | lm loss: 6.456697E+00 | loss scale: 16384.0 | grad norm: 69973.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3122/ 159576 | consumed samples: 52656 | elapsed time per iteration (ms): 15025.4 | learning rate: 1.459E-05 | global batch size: 32 | lm loss: 6.629485E+00 | loss scale: 16384.0 | grad norm: 57268.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3123/ 159576 | consumed samples: 52688 | elapsed time per iteration (ms): 14578.8 | learning rate: 1.460E-05 | global batch size: 32 | lm loss: 6.404414E+00 | loss scale: 16384.0 | grad norm: 63882.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3124/ 159576 | consumed samples: 52720 | elapsed time per iteration (ms): 14582.6 | learning rate: 1.461E-05 | global batch size: 32 | lm loss: 6.473093E+00 | loss scale: 16384.0 | grad norm: 50308.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3125/ 159576 | consumed samples: 52752 | elapsed time per iteration (ms): 14640.7 | learning rate: 1.461E-05 | global batch size: 32 | lm loss: 6.497868E+00 | loss scale: 16384.0 | grad norm: 63650.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3126/ 159576 | consumed samples: 52784 | elapsed time per iteration (ms): 15046.6 | learning rate: 1.462E-05 | global batch size: 32 | lm loss: 6.549313E+00 | loss scale: 16384.0 | grad norm: 72289.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3127/ 159576 | consumed samples: 52816 | elapsed time per iteration (ms): 14723.2 | learning rate: 1.463E-05 | global batch size: 32 | lm loss: 6.590129E+00 | loss scale: 16384.0 | grad norm: 47547.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3128/ 159576 | consumed samples: 52848 | elapsed time per iteration (ms): 14552.7 | learning rate: 1.464E-05 | global batch size: 32 | lm loss: 6.731832E+00 | loss scale: 16384.0 | grad norm: 68103.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3129/ 159576 | consumed samples: 52880 | elapsed time per iteration (ms): 14573.2 | learning rate: 1.465E-05 | global batch size: 32 | lm loss: 6.528438E+00 | loss scale: 16384.0 | grad norm: 57671.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3130/ 159576 | consumed samples: 52912 | elapsed time per iteration (ms): 14663.9 | learning rate: 1.466E-05 | global batch size: 32 | lm loss: 6.672345E+00 | loss scale: 16384.0 | grad norm: 42986.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3131/ 159576 | consumed samples: 52944 | elapsed time per iteration (ms): 14852.7 | learning rate: 1.467E-05 | global batch size: 32 | lm loss: 6.489813E+00 | loss scale: 16384.0 | grad norm: 54642.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3132/ 159576 | consumed samples: 52976 | elapsed time per iteration (ms): 14644.1 | learning rate: 1.468E-05 | global batch size: 32 | lm loss: 6.597792E+00 | loss scale: 16384.0 | grad norm: 52604.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3133/ 159576 | consumed samples: 53008 | elapsed time per iteration (ms): 14641.3 | learning rate: 1.468E-05 | global batch size: 32 | lm loss: 6.527011E+00 | loss scale: 16384.0 | grad norm: 59630.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3134/ 159576 | consumed samples: 53040 | elapsed time per iteration (ms): 14626.4 | learning rate: 1.469E-05 | global batch size: 32 | lm loss: 6.581876E+00 | loss scale: 16384.0 | grad norm: 57219.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3135/ 159576 | consumed samples: 53072 | elapsed time per iteration (ms): 14774.4 | learning rate: 1.470E-05 | global batch size: 32 | lm loss: 6.708944E+00 | loss scale: 16384.0 | grad norm: 55756.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3136/ 159576 | consumed samples: 53104 | elapsed time per iteration (ms): 14618.5 | learning rate: 1.471E-05 | global batch size: 32 | lm loss: 6.679635E+00 | loss scale: 16384.0 | grad norm: 42400.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3137/ 159576 | consumed samples: 53136 | elapsed time per iteration (ms): 14614.4 | learning rate: 1.472E-05 | global batch size: 32 | lm loss: 6.469272E+00 | loss scale: 16384.0 | grad norm: 142351.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3138/ 159576 | consumed samples: 53168 | elapsed time per iteration (ms): 14596.5 | learning rate: 1.473E-05 | global batch size: 32 | lm loss: 6.554899E+00 | loss scale: 16384.0 | grad norm: 98568.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3139/ 159576 | consumed samples: 53200 | elapsed time per iteration (ms): 14719.6 | learning rate: 1.474E-05 | global batch size: 32 | lm loss: 6.618309E+00 | loss scale: 16384.0 | grad norm: 73504.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3140/ 159576 | consumed samples: 53232 | elapsed time per iteration (ms): 14627.2 | learning rate: 1.475E-05 | global batch size: 32 | lm loss: 6.588873E+00 | loss scale: 16384.0 | grad norm: 73534.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3141/ 159576 | consumed samples: 53264 | elapsed time per iteration (ms): 14634.4 | learning rate: 1.476E-05 | global batch size: 32 | lm loss: 6.357007E+00 | loss scale: 16384.0 | grad norm: 84712.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3142/ 159576 | consumed samples: 53296 | elapsed time per iteration (ms): 14717.8 | learning rate: 1.476E-05 | global batch size: 32 | lm loss: 6.623076E+00 | loss scale: 16384.0 | grad norm: 94140.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3143/ 159576 | consumed samples: 53328 | elapsed time per iteration (ms): 14697.5 | learning rate: 1.477E-05 | global batch size: 32 | lm loss: 6.562120E+00 | loss scale: 16384.0 | grad norm: 60657.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3144/ 159576 | consumed samples: 53360 | elapsed time per iteration (ms): 14578.1 | learning rate: 1.478E-05 | global batch size: 32 | lm loss: 6.445246E+00 | loss scale: 16384.0 | grad norm: 61798.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3145/ 159576 | consumed samples: 53392 | elapsed time per iteration (ms): 14616.8 | learning rate: 1.479E-05 | global batch size: 32 | lm loss: 6.440137E+00 | loss scale: 16384.0 | grad norm: 72537.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3146/ 159576 | consumed samples: 53424 | elapsed time per iteration (ms): 14619.6 | learning rate: 1.480E-05 | global batch size: 32 | lm loss: 6.739626E+00 | loss scale: 16384.0 | grad norm: 53372.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3147/ 159576 | consumed samples: 53456 | elapsed time per iteration (ms): 14895.9 | learning rate: 1.481E-05 | global batch size: 32 | lm loss: 6.588343E+00 | loss scale: 16384.0 | grad norm: 132102.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3148/ 159576 | consumed samples: 53488 | elapsed time per iteration (ms): 14681.1 | learning rate: 1.482E-05 | global batch size: 32 | lm loss: 6.551591E+00 | loss scale: 16384.0 | grad norm: 58550.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3149/ 159576 | consumed samples: 53520 | elapsed time per iteration (ms): 14682.3 | learning rate: 1.483E-05 | global batch size: 32 | lm loss: 6.632958E+00 | loss scale: 16384.0 | grad norm: 77007.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3150/ 159576 | consumed samples: 53552 | elapsed time per iteration (ms): 14624.1 | learning rate: 1.484E-05 | global batch size: 32 | lm loss: 6.648820E+00 | loss scale: 16384.0 | grad norm: 86896.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3151/ 159576 | consumed samples: 53584 | elapsed time per iteration (ms): 14845.8 | learning rate: 1.484E-05 | global batch size: 32 | lm loss: 6.446036E+00 | loss scale: 16384.0 | grad norm: 89979.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3152/ 159576 | consumed samples: 53616 | elapsed time per iteration (ms): 14727.8 | learning rate: 1.485E-05 | global batch size: 32 | lm loss: 6.617037E+00 | loss scale: 16384.0 | grad norm: 58488.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3153/ 159576 | consumed samples: 53648 | elapsed time per iteration (ms): 14649.7 | learning rate: 1.486E-05 | global batch size: 32 | lm loss: 6.529748E+00 | loss scale: 16384.0 | grad norm: 74833.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3154/ 159576 | consumed samples: 53680 | elapsed time per iteration (ms): 14647.6 | learning rate: 1.487E-05 | global batch size: 32 | lm loss: 6.562946E+00 | loss scale: 16384.0 | grad norm: 52935.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3155/ 159576 | consumed samples: 53712 | elapsed time per iteration (ms): 15107.7 | learning rate: 1.488E-05 | global batch size: 32 | lm loss: 6.514643E+00 | loss scale: 16384.0 | grad norm: 115570.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3156/ 159576 | consumed samples: 53744 | elapsed time per iteration (ms): 14720.1 | learning rate: 1.489E-05 | global batch size: 32 | lm loss: 6.684644E+00 | loss scale: 16384.0 | grad norm: 80957.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3157/ 159576 | consumed samples: 53776 | elapsed time per iteration (ms): 14692.8 | learning rate: 1.490E-05 | global batch size: 32 | lm loss: 6.519046E+00 | loss scale: 16384.0 | grad norm: 55678.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3158/ 159576 | consumed samples: 53808 | elapsed time per iteration (ms): 14686.5 | learning rate: 1.491E-05 | global batch size: 32 | lm loss: 6.746099E+00 | loss scale: 16384.0 | grad norm: 90492.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3159/ 159576 | consumed samples: 53840 | elapsed time per iteration (ms): 15011.1 | learning rate: 1.492E-05 | global batch size: 32 | lm loss: 6.536778E+00 | loss scale: 16384.0 | grad norm: 71520.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3160/ 159576 | consumed samples: 53872 | elapsed time per iteration (ms): 14579.4 | learning rate: 1.492E-05 | global batch size: 32 | lm loss: 6.666056E+00 | loss scale: 16384.0 | grad norm: 84616.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3161/ 159576 | consumed samples: 53904 | elapsed time per iteration (ms): 14644.1 | learning rate: 1.493E-05 | global batch size: 32 | lm loss: 6.597644E+00 | loss scale: 16384.0 | grad norm: 75093.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3162/ 159576 | consumed samples: 53936 | elapsed time per iteration (ms): 14697.1 | learning rate: 1.494E-05 | global batch size: 32 | lm loss: 6.446161E+00 | loss scale: 16384.0 | grad norm: 65649.952 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3163/ 159576 | consumed samples: 53968 | elapsed time per iteration (ms): 14947.2 | learning rate: 1.495E-05 | global batch size: 32 | lm loss: 6.681765E+00 | loss scale: 16384.0 | grad norm: 60219.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3164/ 159576 | consumed samples: 54000 | elapsed time per iteration (ms): 14663.4 | learning rate: 1.496E-05 | global batch size: 32 | lm loss: 6.525707E+00 | loss scale: 16384.0 | grad norm: 68154.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3165/ 159576 | consumed samples: 54032 | elapsed time per iteration (ms): 14769.3 | learning rate: 1.497E-05 | global batch size: 32 | lm loss: 6.587021E+00 | loss scale: 16384.0 | grad norm: 78180.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3166/ 159576 | consumed samples: 54064 | elapsed time per iteration (ms): 14610.2 | learning rate: 1.498E-05 | global batch size: 32 | lm loss: 6.519161E+00 | loss scale: 16384.0 | grad norm: 61912.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3167/ 159576 | consumed samples: 54096 | elapsed time per iteration (ms): 14999.0 | learning rate: 1.499E-05 | global batch size: 32 | lm loss: 6.632318E+00 | loss scale: 16384.0 | grad norm: 108253.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3168/ 159576 | consumed samples: 54128 | elapsed time per iteration (ms): 14650.1 | learning rate: 1.500E-05 | global batch size: 32 | lm loss: 6.465475E+00 | loss scale: 16384.0 | grad norm: 62950.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3169/ 159576 | consumed samples: 54160 | elapsed time per iteration (ms): 14661.3 | learning rate: 1.500E-05 | global batch size: 32 | lm loss: 6.539711E+00 | loss scale: 16384.0 | grad norm: 92615.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3170/ 159576 | consumed samples: 54192 | elapsed time per iteration (ms): 14674.1 | learning rate: 1.501E-05 | global batch size: 32 | lm loss: 6.579189E+00 | loss scale: 16384.0 | grad norm: 83785.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3171/ 159576 | consumed samples: 54224 | elapsed time per iteration (ms): 15070.8 | learning rate: 1.502E-05 | global batch size: 32 | lm loss: 6.793476E+00 | loss scale: 16384.0 | grad norm: 62540.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3172/ 159576 | consumed samples: 54256 | elapsed time per iteration (ms): 14666.7 | learning rate: 1.503E-05 | global batch size: 32 | lm loss: 6.584558E+00 | loss scale: 16384.0 | grad norm: 112108.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3173/ 159576 | consumed samples: 54288 | elapsed time per iteration (ms): 14625.8 | learning rate: 1.504E-05 | global batch size: 32 | lm loss: 6.600308E+00 | loss scale: 16384.0 | grad norm: 74654.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3174/ 159576 | consumed samples: 54320 | elapsed time per iteration (ms): 14636.6 | learning rate: 1.505E-05 | global batch size: 32 | lm loss: 6.586472E+00 | loss scale: 16384.0 | grad norm: 64570.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3175/ 159576 | consumed samples: 54352 | elapsed time per iteration (ms): 15097.6 | learning rate: 1.506E-05 | global batch size: 32 | lm loss: 6.611074E+00 | loss scale: 16384.0 | grad norm: 67988.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3176/ 159576 | consumed samples: 54384 | elapsed time per iteration (ms): 14507.7 | learning rate: 1.507E-05 | global batch size: 32 | lm loss: 6.524911E+00 | loss scale: 16384.0 | grad norm: 52695.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3177/ 159576 | consumed samples: 54416 | elapsed time per iteration (ms): 14667.9 | learning rate: 1.508E-05 | global batch size: 32 | lm loss: 6.622879E+00 | loss scale: 16384.0 | grad norm: 96311.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3178/ 159576 | consumed samples: 54448 | elapsed time per iteration (ms): 14717.9 | learning rate: 1.508E-05 | global batch size: 32 | lm loss: 6.557679E+00 | loss scale: 16384.0 | grad norm: 75112.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3179/ 159576 | consumed samples: 54480 | elapsed time per iteration (ms): 15028.6 | learning rate: 1.509E-05 | global batch size: 32 | lm loss: 6.508760E+00 | loss scale: 16384.0 | grad norm: 67929.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3180/ 159576 | consumed samples: 54512 | elapsed time per iteration (ms): 14774.6 | learning rate: 1.510E-05 | global batch size: 32 | lm loss: 6.573524E+00 | loss scale: 16384.0 | grad norm: 76526.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3181/ 159576 | consumed samples: 54544 | elapsed time per iteration (ms): 14648.5 | learning rate: 1.511E-05 | global batch size: 32 | lm loss: 6.629518E+00 | loss scale: 16384.0 | grad norm: 51441.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3182/ 159576 | consumed samples: 54576 | elapsed time per iteration (ms): 14620.2 | learning rate: 1.512E-05 | global batch size: 32 | lm loss: 6.528477E+00 | loss scale: 16384.0 | grad norm: 84031.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3183/ 159576 | consumed samples: 54608 | elapsed time per iteration (ms): 14671.0 | learning rate: 1.513E-05 | global batch size: 32 | lm loss: 6.450350E+00 | loss scale: 16384.0 | grad norm: 47787.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3184/ 159576 | consumed samples: 54640 | elapsed time per iteration (ms): 14835.3 | learning rate: 1.514E-05 | global batch size: 32 | lm loss: 6.547495E+00 | loss scale: 16384.0 | grad norm: 57635.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3185/ 159576 | consumed samples: 54672 | elapsed time per iteration (ms): 14691.4 | learning rate: 1.515E-05 | global batch size: 32 | lm loss: 6.438165E+00 | loss scale: 16384.0 | grad norm: 59205.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3186/ 159576 | consumed samples: 54704 | elapsed time per iteration (ms): 14599.9 | learning rate: 1.516E-05 | global batch size: 32 | lm loss: 6.543282E+00 | loss scale: 16384.0 | grad norm: 56916.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3187/ 159576 | consumed samples: 54736 | elapsed time per iteration (ms): 14594.3 | learning rate: 1.516E-05 | global batch size: 32 | lm loss: 6.619707E+00 | loss scale: 16384.0 | grad norm: 87429.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3188/ 159576 | consumed samples: 54768 | elapsed time per iteration (ms): 14717.0 | learning rate: 1.517E-05 | global batch size: 32 | lm loss: 6.575029E+00 | loss scale: 16384.0 | grad norm: 63063.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3189/ 159576 | consumed samples: 54800 | elapsed time per iteration (ms): 14535.7 | learning rate: 1.518E-05 | global batch size: 32 | lm loss: 6.572168E+00 | loss scale: 16384.0 | grad norm: 85759.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3190/ 159576 | consumed samples: 54832 | elapsed time per iteration (ms): 14535.8 | learning rate: 1.519E-05 | global batch size: 32 | lm loss: 6.540303E+00 | loss scale: 16384.0 | grad norm: 59464.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3191/ 159576 | consumed samples: 54864 | elapsed time per iteration (ms): 14477.2 | learning rate: 1.520E-05 | global batch size: 32 | lm loss: 6.545095E+00 | loss scale: 16384.0 | grad norm: 53870.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3192/ 159576 | consumed samples: 54896 | elapsed time per iteration (ms): 14651.8 | learning rate: 1.521E-05 | global batch size: 32 | lm loss: 6.497169E+00 | loss scale: 16384.0 | grad norm: 50516.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3193/ 159576 | consumed samples: 54928 | elapsed time per iteration (ms): 14555.7 | learning rate: 1.522E-05 | global batch size: 32 | lm loss: 6.354692E+00 | loss scale: 16384.0 | grad norm: 67216.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3194/ 159576 | consumed samples: 54960 | elapsed time per iteration (ms): 14548.6 | learning rate: 1.523E-05 | global batch size: 32 | lm loss: 6.704625E+00 | loss scale: 16384.0 | grad norm: 64544.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3195/ 159576 | consumed samples: 54992 | elapsed time per iteration (ms): 14549.1 | learning rate: 1.524E-05 | global batch size: 32 | lm loss: 6.489696E+00 | loss scale: 16384.0 | grad norm: 43746.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3196/ 159576 | consumed samples: 55024 | elapsed time per iteration (ms): 14783.1 | learning rate: 1.524E-05 | global batch size: 32 | lm loss: 6.496898E+00 | loss scale: 16384.0 | grad norm: 146573.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3197/ 159576 | consumed samples: 55056 | elapsed time per iteration (ms): 14527.9 | learning rate: 1.525E-05 | global batch size: 32 | lm loss: 6.568567E+00 | loss scale: 16384.0 | grad norm: 78804.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3198/ 159576 | consumed samples: 55088 | elapsed time per iteration (ms): 14523.2 | learning rate: 1.526E-05 | global batch size: 32 | lm loss: 6.598960E+00 | loss scale: 16384.0 | grad norm: 96783.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3199/ 159576 | consumed samples: 55120 | elapsed time per iteration (ms): 14540.7 | learning rate: 1.527E-05 | global batch size: 32 | lm loss: 6.572606E+00 | loss scale: 16384.0 | grad norm: 89417.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3200/ 159576 | consumed samples: 55152 | elapsed time per iteration (ms): 15008.9 | learning rate: 1.528E-05 | global batch size: 32 | lm loss: 6.506562E+00 | loss scale: 16384.0 | grad norm: 41993.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3201/ 159576 | consumed samples: 55184 | elapsed time per iteration (ms): 14658.0 | learning rate: 1.529E-05 | global batch size: 32 | lm loss: 6.782739E+00 | loss scale: 16384.0 | grad norm: 352113.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3202/ 159576 | consumed samples: 55216 | elapsed time per iteration (ms): 14567.2 | learning rate: 1.530E-05 | global batch size: 32 | lm loss: 6.567737E+00 | loss scale: 16384.0 | grad norm: 255563.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3203/ 159576 | consumed samples: 55248 | elapsed time per iteration (ms): 14521.2 | learning rate: 1.531E-05 | global batch size: 32 | lm loss: 6.758952E+00 | loss scale: 16384.0 | grad norm: 132639.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3204/ 159576 | consumed samples: 55280 | elapsed time per iteration (ms): 15057.0 | learning rate: 1.532E-05 | global batch size: 32 | lm loss: 6.644050E+00 | loss scale: 16384.0 | grad norm: 95206.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3205/ 159576 | consumed samples: 55312 | elapsed time per iteration (ms): 14632.3 | learning rate: 1.532E-05 | global batch size: 32 | lm loss: 6.559070E+00 | loss scale: 16384.0 | grad norm: 92448.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3206/ 159576 | consumed samples: 55344 | elapsed time per iteration (ms): 14560.7 | learning rate: 1.533E-05 | global batch size: 32 | lm loss: 6.544364E+00 | loss scale: 16384.0 | grad norm: 87185.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3207/ 159576 | consumed samples: 55376 | elapsed time per iteration (ms): 14559.6 | learning rate: 1.534E-05 | global batch size: 32 | lm loss: 6.617725E+00 | loss scale: 16384.0 | grad norm: 147534.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3208/ 159576 | consumed samples: 55408 | elapsed time per iteration (ms): 14919.1 | learning rate: 1.535E-05 | global batch size: 32 | lm loss: 6.505226E+00 | loss scale: 16384.0 | grad norm: 82317.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3209/ 159576 | consumed samples: 55440 | elapsed time per iteration (ms): 14628.9 | learning rate: 1.536E-05 | global batch size: 32 | lm loss: 6.529959E+00 | loss scale: 16384.0 | grad norm: 62063.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3210/ 159576 | consumed samples: 55472 | elapsed time per iteration (ms): 14562.8 | learning rate: 1.537E-05 | global batch size: 32 | lm loss: 6.499523E+00 | loss scale: 16384.0 | grad norm: 59027.974 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3211/ 159576 | consumed samples: 55504 | elapsed time per iteration (ms): 14551.3 | learning rate: 1.538E-05 | global batch size: 32 | lm loss: 6.612097E+00 | loss scale: 16384.0 | grad norm: 142076.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3212/ 159576 | consumed samples: 55536 | elapsed time per iteration (ms): 14906.9 | learning rate: 1.539E-05 | global batch size: 32 | lm loss: 6.726549E+00 | loss scale: 16384.0 | grad norm: 85971.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3213/ 159576 | consumed samples: 55568 | elapsed time per iteration (ms): 14484.4 | learning rate: 1.539E-05 | global batch size: 32 | lm loss: 6.627134E+00 | loss scale: 16384.0 | grad norm: 74784.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3214/ 159576 | consumed samples: 55600 | elapsed time per iteration (ms): 14568.5 | learning rate: 1.540E-05 | global batch size: 32 | lm loss: 6.684568E+00 | loss scale: 16384.0 | grad norm: 85537.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3215/ 159576 | consumed samples: 55632 | elapsed time per iteration (ms): 14541.7 | learning rate: 1.541E-05 | global batch size: 32 | lm loss: 6.632449E+00 | loss scale: 16384.0 | grad norm: 118554.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3216/ 159576 | consumed samples: 55664 | elapsed time per iteration (ms): 14903.9 | learning rate: 1.542E-05 | global batch size: 32 | lm loss: 6.491426E+00 | loss scale: 16384.0 | grad norm: 66361.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3217/ 159576 | consumed samples: 55696 | elapsed time per iteration (ms): 14654.1 | learning rate: 1.543E-05 | global batch size: 32 | lm loss: 6.599683E+00 | loss scale: 16384.0 | grad norm: 66284.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3218/ 159576 | consumed samples: 55728 | elapsed time per iteration (ms): 14564.4 | learning rate: 1.544E-05 | global batch size: 32 | lm loss: 6.671634E+00 | loss scale: 16384.0 | grad norm: 48626.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3219/ 159576 | consumed samples: 55760 | elapsed time per iteration (ms): 14567.8 | learning rate: 1.545E-05 | global batch size: 32 | lm loss: 6.653804E+00 | loss scale: 16384.0 | grad norm: 84407.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3220/ 159576 | consumed samples: 55792 | elapsed time per iteration (ms): 14939.3 | learning rate: 1.546E-05 | global batch size: 32 | lm loss: 6.519379E+00 | loss scale: 16384.0 | grad norm: 72885.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3221/ 159576 | consumed samples: 55824 | elapsed time per iteration (ms): 14579.8 | learning rate: 1.547E-05 | global batch size: 32 | lm loss: 6.658468E+00 | loss scale: 16384.0 | grad norm: 69063.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3222/ 159576 | consumed samples: 55856 | elapsed time per iteration (ms): 14568.3 | learning rate: 1.547E-05 | global batch size: 32 | lm loss: 6.544227E+00 | loss scale: 16384.0 | grad norm: 94167.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3223/ 159576 | consumed samples: 55888 | elapsed time per iteration (ms): 14530.3 | learning rate: 1.548E-05 | global batch size: 32 | lm loss: 6.519998E+00 | loss scale: 16384.0 | grad norm: 74630.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3224/ 159576 | consumed samples: 55920 | elapsed time per iteration (ms): 14849.7 | learning rate: 1.549E-05 | global batch size: 32 | lm loss: 6.586551E+00 | loss scale: 16384.0 | grad norm: 76630.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3225/ 159576 | consumed samples: 55952 | elapsed time per iteration (ms): 14888.8 | learning rate: 1.550E-05 | global batch size: 32 | lm loss: 6.687891E+00 | loss scale: 16384.0 | grad norm: 70630.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3226/ 159576 | consumed samples: 55984 | elapsed time per iteration (ms): 14540.3 | learning rate: 1.551E-05 | global batch size: 32 | lm loss: 6.595382E+00 | loss scale: 16384.0 | grad norm: 92178.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3227/ 159576 | consumed samples: 56016 | elapsed time per iteration (ms): 14557.7 | learning rate: 1.552E-05 | global batch size: 32 | lm loss: 6.364616E+00 | loss scale: 16384.0 | grad norm: 62395.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3228/ 159576 | consumed samples: 56048 | elapsed time per iteration (ms): 14547.2 | learning rate: 1.553E-05 | global batch size: 32 | lm loss: 6.614971E+00 | loss scale: 16384.0 | grad norm: 72348.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3229/ 159576 | consumed samples: 56080 | elapsed time per iteration (ms): 14765.8 | learning rate: 1.554E-05 | global batch size: 32 | lm loss: 6.527470E+00 | loss scale: 16384.0 | grad norm: 70068.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3230/ 159576 | consumed samples: 56112 | elapsed time per iteration (ms): 14547.7 | learning rate: 1.555E-05 | global batch size: 32 | lm loss: 6.691795E+00 | loss scale: 16384.0 | grad norm: 79540.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3231/ 159576 | consumed samples: 56144 | elapsed time per iteration (ms): 14659.9 | learning rate: 1.555E-05 | global batch size: 32 | lm loss: 6.541613E+00 | loss scale: 16384.0 | grad norm: 49841.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3232/ 159576 | consumed samples: 56176 | elapsed time per iteration (ms): 14501.9 | learning rate: 1.556E-05 | global batch size: 32 | lm loss: 6.634310E+00 | loss scale: 16384.0 | grad norm: 67541.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3233/ 159576 | consumed samples: 56208 | elapsed time per iteration (ms): 14751.5 | learning rate: 1.557E-05 | global batch size: 32 | lm loss: 6.538262E+00 | loss scale: 16384.0 | grad norm: 60234.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3234/ 159576 | consumed samples: 56240 | elapsed time per iteration (ms): 14540.9 | learning rate: 1.558E-05 | global batch size: 32 | lm loss: 6.572741E+00 | loss scale: 16384.0 | grad norm: 51996.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3235/ 159576 | consumed samples: 56272 | elapsed time per iteration (ms): 14525.6 | learning rate: 1.559E-05 | global batch size: 32 | lm loss: 6.514688E+00 | loss scale: 16384.0 | grad norm: 80129.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3236/ 159576 | consumed samples: 56304 | elapsed time per iteration (ms): 14525.2 | learning rate: 1.560E-05 | global batch size: 32 | lm loss: 6.597489E+00 | loss scale: 16384.0 | grad norm: 106848.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3237/ 159576 | consumed samples: 56336 | elapsed time per iteration (ms): 14776.9 | learning rate: 1.561E-05 | global batch size: 32 | lm loss: 6.556981E+00 | loss scale: 16384.0 | grad norm: 71439.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3238/ 159576 | consumed samples: 56368 | elapsed time per iteration (ms): 14561.5 | learning rate: 1.562E-05 | global batch size: 32 | lm loss: 6.569613E+00 | loss scale: 16384.0 | grad norm: 70525.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3239/ 159576 | consumed samples: 56400 | elapsed time per iteration (ms): 14478.4 | learning rate: 1.563E-05 | global batch size: 32 | lm loss: 6.541091E+00 | loss scale: 16384.0 | grad norm: 47017.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3240/ 159576 | consumed samples: 56432 | elapsed time per iteration (ms): 14587.1 | learning rate: 1.563E-05 | global batch size: 32 | lm loss: 6.697134E+00 | loss scale: 16384.0 | grad norm: 53866.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3241/ 159576 | consumed samples: 56464 | elapsed time per iteration (ms): 14901.2 | learning rate: 1.564E-05 | global batch size: 32 | lm loss: 6.463998E+00 | loss scale: 16384.0 | grad norm: 72517.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3242/ 159576 | consumed samples: 56496 | elapsed time per iteration (ms): 14602.2 | learning rate: 1.565E-05 | global batch size: 32 | lm loss: 6.557918E+00 | loss scale: 16384.0 | grad norm: 51986.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3243/ 159576 | consumed samples: 56528 | elapsed time per iteration (ms): 14553.6 | learning rate: 1.566E-05 | global batch size: 32 | lm loss: 6.491773E+00 | loss scale: 16384.0 | grad norm: 68222.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3244/ 159576 | consumed samples: 56560 | elapsed time per iteration (ms): 14559.7 | learning rate: 1.567E-05 | global batch size: 32 | lm loss: 6.590208E+00 | loss scale: 16384.0 | grad norm: 72691.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3245/ 159576 | consumed samples: 56592 | elapsed time per iteration (ms): 14894.6 | learning rate: 1.568E-05 | global batch size: 32 | lm loss: 6.551069E+00 | loss scale: 16384.0 | grad norm: 71227.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3246/ 159576 | consumed samples: 56624 | elapsed time per iteration (ms): 14706.4 | learning rate: 1.569E-05 | global batch size: 32 | lm loss: 6.536276E+00 | loss scale: 16384.0 | grad norm: 77853.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3247/ 159576 | consumed samples: 56656 | elapsed time per iteration (ms): 14557.1 | learning rate: 1.570E-05 | global batch size: 32 | lm loss: 6.547366E+00 | loss scale: 16384.0 | grad norm: 91853.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3248/ 159576 | consumed samples: 56688 | elapsed time per iteration (ms): 14512.9 | learning rate: 1.571E-05 | global batch size: 32 | lm loss: 6.604490E+00 | loss scale: 16384.0 | grad norm: 61725.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3249/ 159576 | consumed samples: 56720 | elapsed time per iteration (ms): 14949.1 | learning rate: 1.571E-05 | global batch size: 32 | lm loss: 6.555557E+00 | loss scale: 16384.0 | grad norm: 55414.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3250/ 159576 | consumed samples: 56752 | elapsed time per iteration (ms): 14468.6 | learning rate: 1.572E-05 | global batch size: 32 | lm loss: 6.471034E+00 | loss scale: 16384.0 | grad norm: 39264.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3251/ 159576 | consumed samples: 56784 | elapsed time per iteration (ms): 14601.9 | learning rate: 1.573E-05 | global batch size: 32 | lm loss: 6.472137E+00 | loss scale: 16384.0 | grad norm: 51720.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3252/ 159576 | consumed samples: 56816 | elapsed time per iteration (ms): 14481.3 | learning rate: 1.574E-05 | global batch size: 32 | lm loss: 6.564797E+00 | loss scale: 16384.0 | grad norm: 55129.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3253/ 159576 | consumed samples: 56848 | elapsed time per iteration (ms): 14865.7 | learning rate: 1.575E-05 | global batch size: 32 | lm loss: 6.433147E+00 | loss scale: 16384.0 | grad norm: 48761.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3254/ 159576 | consumed samples: 56880 | elapsed time per iteration (ms): 14607.7 | learning rate: 1.576E-05 | global batch size: 32 | lm loss: 6.486347E+00 | loss scale: 16384.0 | grad norm: 51447.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3255/ 159576 | consumed samples: 56912 | elapsed time per iteration (ms): 14476.2 | learning rate: 1.577E-05 | global batch size: 32 | lm loss: 6.670080E+00 | loss scale: 16384.0 | grad norm: 49692.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3256/ 159576 | consumed samples: 56944 | elapsed time per iteration (ms): 14532.2 | learning rate: 1.578E-05 | global batch size: 32 | lm loss: 6.449496E+00 | loss scale: 16384.0 | grad norm: 46597.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3257/ 159576 | consumed samples: 56976 | elapsed time per iteration (ms): 14907.4 | learning rate: 1.579E-05 | global batch size: 32 | lm loss: 6.651023E+00 | loss scale: 16384.0 | grad norm: 50509.142 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3258/ 159576 | consumed samples: 57008 | elapsed time per iteration (ms): 14521.0 | learning rate: 1.579E-05 | global batch size: 32 | lm loss: 6.557060E+00 | loss scale: 16384.0 | grad norm: 46431.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3259/ 159576 | consumed samples: 57040 | elapsed time per iteration (ms): 14527.8 | learning rate: 1.580E-05 | global batch size: 32 | lm loss: 6.802115E+00 | loss scale: 16384.0 | grad norm: 46019.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3260/ 159576 | consumed samples: 57072 | elapsed time per iteration (ms): 14560.3 | learning rate: 1.581E-05 | global batch size: 32 | lm loss: 6.480462E+00 | loss scale: 16384.0 | grad norm: 54023.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3261/ 159576 | consumed samples: 57104 | elapsed time per iteration (ms): 14898.0 | learning rate: 1.582E-05 | global batch size: 32 | lm loss: 6.696016E+00 | loss scale: 16384.0 | grad norm: 51541.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3262/ 159576 | consumed samples: 57136 | elapsed time per iteration (ms): 14574.6 | learning rate: 1.583E-05 | global batch size: 32 | lm loss: 6.633371E+00 | loss scale: 16384.0 | grad norm: 64314.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3263/ 159576 | consumed samples: 57168 | elapsed time per iteration (ms): 14524.2 | learning rate: 1.584E-05 | global batch size: 32 | lm loss: 6.540409E+00 | loss scale: 16384.0 | grad norm: 53098.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3264/ 159576 | consumed samples: 57200 | elapsed time per iteration (ms): 14557.6 | learning rate: 1.585E-05 | global batch size: 32 | lm loss: 6.376970E+00 | loss scale: 32768.0 | grad norm: 75107.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3265/ 159576 | consumed samples: 57232 | elapsed time per iteration (ms): 14784.4 | learning rate: 1.586E-05 | global batch size: 32 | lm loss: 6.602743E+00 | loss scale: 32768.0 | grad norm: 125297.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3266/ 159576 | consumed samples: 57264 | elapsed time per iteration (ms): 14634.8 | learning rate: 1.587E-05 | global batch size: 32 | lm loss: 6.514446E+00 | loss scale: 32768.0 | grad norm: 194672.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3267/ 159576 | consumed samples: 57296 | elapsed time per iteration (ms): 14570.9 | learning rate: 1.587E-05 | global batch size: 32 | lm loss: 6.630837E+00 | loss scale: 32768.0 | grad norm: 107205.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3268/ 159576 | consumed samples: 57328 | elapsed time per iteration (ms): 14454.1 | learning rate: 1.588E-05 | global batch size: 32 | lm loss: 6.541512E+00 | loss scale: 32768.0 | grad norm: 112309.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3269/ 159576 | consumed samples: 57360 | elapsed time per iteration (ms): 14551.3 | learning rate: 1.589E-05 | global batch size: 32 | lm loss: 6.542883E+00 | loss scale: 32768.0 | grad norm: 132672.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3270/ 159576 | consumed samples: 57392 | elapsed time per iteration (ms): 14718.7 | learning rate: 1.590E-05 | global batch size: 32 | lm loss: 6.448256E+00 | loss scale: 32768.0 | grad norm: 151950.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3271/ 159576 | consumed samples: 57424 | elapsed time per iteration (ms): 14527.0 | learning rate: 1.591E-05 | global batch size: 32 | lm loss: 6.688755E+00 | loss scale: 32768.0 | grad norm: 91675.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3272/ 159576 | consumed samples: 57456 | elapsed time per iteration (ms): 14559.6 | learning rate: 1.592E-05 | global batch size: 32 | lm loss: 6.550324E+00 | loss scale: 32768.0 | grad norm: 241437.766 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3273/ 159576 | consumed samples: 57488 | elapsed time per iteration (ms): 14521.4 | learning rate: 1.593E-05 | global batch size: 32 | lm loss: 6.620804E+00 | loss scale: 32768.0 | grad norm: 130842.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3274/ 159576 | consumed samples: 57520 | elapsed time per iteration (ms): 14697.5 | learning rate: 1.594E-05 | global batch size: 32 | lm loss: 6.459725E+00 | loss scale: 32768.0 | grad norm: 146465.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3275/ 159576 | consumed samples: 57552 | elapsed time per iteration (ms): 14476.2 | learning rate: 1.595E-05 | global batch size: 32 | lm loss: 6.576751E+00 | loss scale: 32768.0 | grad norm: 114711.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3276/ 159576 | consumed samples: 57584 | elapsed time per iteration (ms): 14512.4 | learning rate: 1.595E-05 | global batch size: 32 | lm loss: 6.599717E+00 | loss scale: 32768.0 | grad norm: 283220.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3277/ 159576 | consumed samples: 57616 | elapsed time per iteration (ms): 14565.0 | learning rate: 1.596E-05 | global batch size: 32 | lm loss: 6.395351E+00 | loss scale: 32768.0 | grad norm: 206105.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3278/ 159576 | consumed samples: 57648 | elapsed time per iteration (ms): 14816.8 | learning rate: 1.597E-05 | global batch size: 32 | lm loss: 6.569580E+00 | loss scale: 32768.0 | grad norm: 183586.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3279/ 159576 | consumed samples: 57680 | elapsed time per iteration (ms): 14615.5 | learning rate: 1.598E-05 | global batch size: 32 | lm loss: 6.572281E+00 | loss scale: 32768.0 | grad norm: 161878.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3280/ 159576 | consumed samples: 57712 | elapsed time per iteration (ms): 14521.1 | learning rate: 1.599E-05 | global batch size: 32 | lm loss: 6.513469E+00 | loss scale: 32768.0 | grad norm: 134922.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3281/ 159576 | consumed samples: 57744 | elapsed time per iteration (ms): 14549.6 | learning rate: 1.600E-05 | global batch size: 32 | lm loss: 6.680450E+00 | loss scale: 32768.0 | grad norm: 214593.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3282/ 159576 | consumed samples: 57776 | elapsed time per iteration (ms): 14885.6 | learning rate: 1.601E-05 | global batch size: 32 | lm loss: 6.528894E+00 | loss scale: 32768.0 | grad norm: 136120.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3283/ 159576 | consumed samples: 57808 | elapsed time per iteration (ms): 14648.1 | learning rate: 1.602E-05 | global batch size: 32 | lm loss: 6.610715E+00 | loss scale: 32768.0 | grad norm: 124689.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3284/ 159576 | consumed samples: 57840 | elapsed time per iteration (ms): 14446.0 | learning rate: 1.603E-05 | global batch size: 32 | lm loss: 6.493599E+00 | loss scale: 32768.0 | grad norm: 193703.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3285/ 159576 | consumed samples: 57872 | elapsed time per iteration (ms): 14530.4 | learning rate: 1.603E-05 | global batch size: 32 | lm loss: 6.495665E+00 | loss scale: 32768.0 | grad norm: 180680.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3286/ 159576 | consumed samples: 57904 | elapsed time per iteration (ms): 15079.8 | learning rate: 1.604E-05 | global batch size: 32 | lm loss: 6.484368E+00 | loss scale: 32768.0 | grad norm: 151352.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3287/ 159576 | consumed samples: 57936 | elapsed time per iteration (ms): 14519.7 | learning rate: 1.605E-05 | global batch size: 32 | lm loss: 6.533234E+00 | loss scale: 32768.0 | grad norm: 135972.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3288/ 159576 | consumed samples: 57968 | elapsed time per iteration (ms): 14502.1 | learning rate: 1.606E-05 | global batch size: 32 | lm loss: 6.485931E+00 | loss scale: 32768.0 | grad norm: 175469.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3289/ 159576 | consumed samples: 58000 | elapsed time per iteration (ms): 14650.6 | learning rate: 1.607E-05 | global batch size: 32 | lm loss: 6.588792E+00 | loss scale: 32768.0 | grad norm: 95804.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3290/ 159576 | consumed samples: 58032 | elapsed time per iteration (ms): 15011.0 | learning rate: 1.608E-05 | global batch size: 32 | lm loss: 6.649066E+00 | loss scale: 32768.0 | grad norm: 158912.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3291/ 159576 | consumed samples: 58064 | elapsed time per iteration (ms): 14545.2 | learning rate: 1.609E-05 | global batch size: 32 | lm loss: 6.518328E+00 | loss scale: 32768.0 | grad norm: 143118.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3292/ 159576 | consumed samples: 58096 | elapsed time per iteration (ms): 14548.9 | learning rate: 1.610E-05 | global batch size: 32 | lm loss: 6.497085E+00 | loss scale: 32768.0 | grad norm: 242609.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3293/ 159576 | consumed samples: 58128 | elapsed time per iteration (ms): 14674.4 | learning rate: 1.611E-05 | global batch size: 32 | lm loss: 6.516074E+00 | loss scale: 32768.0 | grad norm: 230563.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3294/ 159576 | consumed samples: 58160 | elapsed time per iteration (ms): 15018.5 | learning rate: 1.611E-05 | global batch size: 32 | lm loss: 6.357250E+00 | loss scale: 32768.0 | grad norm: 145279.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3295/ 159576 | consumed samples: 58192 | elapsed time per iteration (ms): 14502.4 | learning rate: 1.612E-05 | global batch size: 32 | lm loss: 6.532835E+00 | loss scale: 32768.0 | grad norm: 159209.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3296/ 159576 | consumed samples: 58224 | elapsed time per iteration (ms): 14618.1 | learning rate: 1.613E-05 | global batch size: 32 | lm loss: 6.610238E+00 | loss scale: 32768.0 | grad norm: 103662.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3297/ 159576 | consumed samples: 58256 | elapsed time per iteration (ms): 14641.0 | learning rate: 1.614E-05 | global batch size: 32 | lm loss: 6.559636E+00 | loss scale: 32768.0 | grad norm: 342247.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3298/ 159576 | consumed samples: 58288 | elapsed time per iteration (ms): 14987.0 | learning rate: 1.615E-05 | global batch size: 32 | lm loss: 6.595356E+00 | loss scale: 32768.0 | grad norm: 185444.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3299/ 159576 | consumed samples: 58320 | elapsed time per iteration (ms): 14547.8 | learning rate: 1.616E-05 | global batch size: 32 | lm loss: 6.538537E+00 | loss scale: 32768.0 | grad norm: 145127.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3300/ 159576 | consumed samples: 58352 | elapsed time per iteration (ms): 14643.9 | learning rate: 1.617E-05 | global batch size: 32 | lm loss: 6.453721E+00 | loss scale: 32768.0 | grad norm: 235646.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3301/ 159576 | consumed samples: 58384 | elapsed time per iteration (ms): 14648.1 | learning rate: 1.618E-05 | global batch size: 32 | lm loss: 6.672456E+00 | loss scale: 32768.0 | grad norm: 131805.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3302/ 159576 | consumed samples: 58416 | elapsed time per iteration (ms): 15043.8 | learning rate: 1.618E-05 | global batch size: 32 | lm loss: 6.513996E+00 | loss scale: 32768.0 | grad norm: 172559.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3303/ 159576 | consumed samples: 58448 | elapsed time per iteration (ms): 14557.7 | learning rate: 1.619E-05 | global batch size: 32 | lm loss: 6.688443E+00 | loss scale: 32768.0 | grad norm: 154181.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3304/ 159576 | consumed samples: 58480 | elapsed time per iteration (ms): 14541.6 | learning rate: 1.620E-05 | global batch size: 32 | lm loss: 6.865191E+00 | loss scale: 32768.0 | grad norm: 171141.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3305/ 159576 | consumed samples: 58512 | elapsed time per iteration (ms): 14558.8 | learning rate: 1.621E-05 | global batch size: 32 | lm loss: 6.529626E+00 | loss scale: 32768.0 | grad norm: 112641.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3306/ 159576 | consumed samples: 58544 | elapsed time per iteration (ms): 14971.5 | learning rate: 1.622E-05 | global batch size: 32 | lm loss: 6.571610E+00 | loss scale: 32768.0 | grad norm: 115411.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3307/ 159576 | consumed samples: 58576 | elapsed time per iteration (ms): 14532.6 | learning rate: 1.623E-05 | global batch size: 32 | lm loss: 6.792900E+00 | loss scale: 32768.0 | grad norm: 153224.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3308/ 159576 | consumed samples: 58608 | elapsed time per iteration (ms): 14639.5 | learning rate: 1.624E-05 | global batch size: 32 | lm loss: 6.490854E+00 | loss scale: 32768.0 | grad norm: 125276.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3309/ 159576 | consumed samples: 58640 | elapsed time per iteration (ms): 14639.4 | learning rate: 1.625E-05 | global batch size: 32 | lm loss: 6.604795E+00 | loss scale: 32768.0 | grad norm: 163307.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3310/ 159576 | consumed samples: 58672 | elapsed time per iteration (ms): 14641.3 | learning rate: 1.626E-05 | global batch size: 32 | lm loss: 6.486001E+00 | loss scale: 32768.0 | grad norm: 169732.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3311/ 159576 | consumed samples: 58704 | elapsed time per iteration (ms): 14763.3 | learning rate: 1.626E-05 | global batch size: 32 | lm loss: 6.513995E+00 | loss scale: 32768.0 | grad norm: 106129.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3312/ 159576 | consumed samples: 58736 | elapsed time per iteration (ms): 14481.4 | learning rate: 1.627E-05 | global batch size: 32 | lm loss: 6.538834E+00 | loss scale: 32768.0 | grad norm: 143827.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3313/ 159576 | consumed samples: 58768 | elapsed time per iteration (ms): 14535.0 | learning rate: 1.628E-05 | global batch size: 32 | lm loss: 6.508898E+00 | loss scale: 32768.0 | grad norm: 96517.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3314/ 159576 | consumed samples: 58800 | elapsed time per iteration (ms): 14389.3 | learning rate: 1.629E-05 | global batch size: 32 | lm loss: 6.557344E+00 | loss scale: 32768.0 | grad norm: 160647.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3315/ 159576 | consumed samples: 58832 | elapsed time per iteration (ms): 14617.9 | learning rate: 1.630E-05 | global batch size: 32 | lm loss: 6.579730E+00 | loss scale: 32768.0 | grad norm: 166511.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3316/ 159576 | consumed samples: 58864 | elapsed time per iteration (ms): 14527.6 | learning rate: 1.631E-05 | global batch size: 32 | lm loss: 6.510201E+00 | loss scale: 32768.0 | grad norm: 147882.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3317/ 159576 | consumed samples: 58896 | elapsed time per iteration (ms): 14470.3 | learning rate: 1.632E-05 | global batch size: 32 | lm loss: 6.570679E+00 | loss scale: 32768.0 | grad norm: 133948.873 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3318/ 159576 | consumed samples: 58928 | elapsed time per iteration (ms): 14503.9 | learning rate: 1.633E-05 | global batch size: 32 | lm loss: 6.505450E+00 | loss scale: 32768.0 | grad norm: 117987.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3319/ 159576 | consumed samples: 58960 | elapsed time per iteration (ms): 14576.7 | learning rate: 1.634E-05 | global batch size: 32 | lm loss: 6.637349E+00 | loss scale: 32768.0 | grad norm: 158753.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3320/ 159576 | consumed samples: 58992 | elapsed time per iteration (ms): 14474.5 | learning rate: 1.634E-05 | global batch size: 32 | lm loss: 6.463197E+00 | loss scale: 32768.0 | grad norm: 133223.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3321/ 159576 | consumed samples: 59024 | elapsed time per iteration (ms): 14495.2 | learning rate: 1.635E-05 | global batch size: 32 | lm loss: 6.754025E+00 | loss scale: 32768.0 | grad norm: 147882.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3322/ 159576 | consumed samples: 59056 | elapsed time per iteration (ms): 14426.8 | learning rate: 1.636E-05 | global batch size: 32 | lm loss: 6.377756E+00 | loss scale: 32768.0 | grad norm: 107176.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3323/ 159576 | consumed samples: 59088 | elapsed time per iteration (ms): 14894.2 | learning rate: 1.637E-05 | global batch size: 32 | lm loss: 6.485399E+00 | loss scale: 32768.0 | grad norm: 104276.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3324/ 159576 | consumed samples: 59120 | elapsed time per iteration (ms): 14539.8 | learning rate: 1.638E-05 | global batch size: 32 | lm loss: 6.595620E+00 | loss scale: 32768.0 | grad norm: 102253.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3325/ 159576 | consumed samples: 59152 | elapsed time per iteration (ms): 14528.7 | learning rate: 1.639E-05 | global batch size: 32 | lm loss: 6.372971E+00 | loss scale: 32768.0 | grad norm: 170203.107 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3326/ 159576 | consumed samples: 59184 | elapsed time per iteration (ms): 14629.3 | learning rate: 1.640E-05 | global batch size: 32 | lm loss: 6.460327E+00 | loss scale: 32768.0 | grad norm: 108888.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3327/ 159576 | consumed samples: 59216 | elapsed time per iteration (ms): 15011.9 | learning rate: 1.641E-05 | global batch size: 32 | lm loss: 6.462082E+00 | loss scale: 32768.0 | grad norm: 154915.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3328/ 159576 | consumed samples: 59248 | elapsed time per iteration (ms): 14457.0 | learning rate: 1.642E-05 | global batch size: 32 | lm loss: 6.526529E+00 | loss scale: 32768.0 | grad norm: 135486.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3329/ 159576 | consumed samples: 59280 | elapsed time per iteration (ms): 14493.0 | learning rate: 1.642E-05 | global batch size: 32 | lm loss: 6.546029E+00 | loss scale: 32768.0 | grad norm: 97252.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3330/ 159576 | consumed samples: 59312 | elapsed time per iteration (ms): 14488.7 | learning rate: 1.643E-05 | global batch size: 32 | lm loss: 6.540400E+00 | loss scale: 32768.0 | grad norm: 234564.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3331/ 159576 | consumed samples: 59344 | elapsed time per iteration (ms): 14982.7 | learning rate: 1.644E-05 | global batch size: 32 | lm loss: 6.473689E+00 | loss scale: 32768.0 | grad norm: 104411.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3332/ 159576 | consumed samples: 59376 | elapsed time per iteration (ms): 14455.1 | learning rate: 1.645E-05 | global batch size: 32 | lm loss: 6.589927E+00 | loss scale: 32768.0 | grad norm: 240696.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 15:06:48] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 15:06:48] PULSE: tr8-104B is running for 9:14:37 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 3333/ 159576 | consumed samples: 59408 | elapsed time per iteration (ms): 14571.6 | learning rate: 1.646E-05 | global batch size: 32 | lm loss: 6.604051E+00 | loss scale: 32768.0 | grad norm: 150869.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3334/ 159576 | consumed samples: 59440 | elapsed time per iteration (ms): 14495.5 | learning rate: 1.647E-05 | global batch size: 32 | lm loss: 6.565775E+00 | loss scale: 32768.0 | grad norm: 141203.105 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3335/ 159576 | consumed samples: 59472 | elapsed time per iteration (ms): 14896.4 | learning rate: 1.648E-05 | global batch size: 32 | lm loss: 6.456505E+00 | loss scale: 32768.0 | grad norm: 145244.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3336/ 159576 | consumed samples: 59504 | elapsed time per iteration (ms): 14515.3 | learning rate: 1.649E-05 | global batch size: 32 | lm loss: 6.488969E+00 | loss scale: 32768.0 | grad norm: 246097.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3337/ 159576 | consumed samples: 59536 | elapsed time per iteration (ms): 14492.7 | learning rate: 1.650E-05 | global batch size: 32 | lm loss: 6.455498E+00 | loss scale: 32768.0 | grad norm: 130955.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3338/ 159576 | consumed samples: 59568 | elapsed time per iteration (ms): 14531.1 | learning rate: 1.650E-05 | global batch size: 32 | lm loss: 6.593586E+00 | loss scale: 32768.0 | grad norm: 136721.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3339/ 159576 | consumed samples: 59600 | elapsed time per iteration (ms): 14962.3 | learning rate: 1.651E-05 | global batch size: 32 | lm loss: 6.564628E+00 | loss scale: 32768.0 | grad norm: 141976.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3340/ 159576 | consumed samples: 59632 | elapsed time per iteration (ms): 14550.8 | learning rate: 1.652E-05 | global batch size: 32 | lm loss: 6.373518E+00 | loss scale: 32768.0 | grad norm: 113008.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3341/ 159576 | consumed samples: 59664 | elapsed time per iteration (ms): 14563.2 | learning rate: 1.653E-05 | global batch size: 32 | lm loss: 6.658302E+00 | loss scale: 32768.0 | grad norm: 113653.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3342/ 159576 | consumed samples: 59696 | elapsed time per iteration (ms): 14584.3 | learning rate: 1.654E-05 | global batch size: 32 | lm loss: 6.485311E+00 | loss scale: 32768.0 | grad norm: 162130.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3343/ 159576 | consumed samples: 59728 | elapsed time per iteration (ms): 14879.0 | learning rate: 1.655E-05 | global batch size: 32 | lm loss: 6.461338E+00 | loss scale: 32768.0 | grad norm: 284392.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3344/ 159576 | consumed samples: 59760 | elapsed time per iteration (ms): 14679.3 | learning rate: 1.656E-05 | global batch size: 32 | lm loss: 6.473630E+00 | loss scale: 32768.0 | grad norm: 142043.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3345/ 159576 | consumed samples: 59792 | elapsed time per iteration (ms): 14580.5 | learning rate: 1.657E-05 | global batch size: 32 | lm loss: 6.494667E+00 | loss scale: 32768.0 | grad norm: 125366.936 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3346/ 159576 | consumed samples: 59824 | elapsed time per iteration (ms): 14552.3 | learning rate: 1.658E-05 | global batch size: 32 | lm loss: 6.560155E+00 | loss scale: 32768.0 | grad norm: 126654.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3347/ 159576 | consumed samples: 59856 | elapsed time per iteration (ms): 14707.5 | learning rate: 1.658E-05 | global batch size: 32 | lm loss: 6.462931E+00 | loss scale: 32768.0 | grad norm: 123122.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3348/ 159576 | consumed samples: 59888 | elapsed time per iteration (ms): 14897.9 | learning rate: 1.659E-05 | global batch size: 32 | lm loss: 6.542427E+00 | loss scale: 32768.0 | grad norm: 147629.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3349/ 159576 | consumed samples: 59920 | elapsed time per iteration (ms): 14638.7 | learning rate: 1.660E-05 | global batch size: 32 | lm loss: 6.508281E+00 | loss scale: 32768.0 | grad norm: 181625.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3350/ 159576 | consumed samples: 59952 | elapsed time per iteration (ms): 14590.8 | learning rate: 1.661E-05 | global batch size: 32 | lm loss: 6.592540E+00 | loss scale: 32768.0 | grad norm: 161023.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3351/ 159576 | consumed samples: 59984 | elapsed time per iteration (ms): 14484.6 | learning rate: 1.662E-05 | global batch size: 32 | lm loss: 6.474733E+00 | loss scale: 32768.0 | grad norm: 125810.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3352/ 159576 | consumed samples: 60016 | elapsed time per iteration (ms): 14782.0 | learning rate: 1.663E-05 | global batch size: 32 | lm loss: 6.515071E+00 | loss scale: 32768.0 | grad norm: 148493.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3353/ 159576 | consumed samples: 60048 | elapsed time per iteration (ms): 14601.7 | learning rate: 1.664E-05 | global batch size: 32 | lm loss: 6.510946E+00 | loss scale: 32768.0 | grad norm: 154098.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3354/ 159576 | consumed samples: 60080 | elapsed time per iteration (ms): 14551.7 | learning rate: 1.665E-05 | global batch size: 32 | lm loss: 6.639778E+00 | loss scale: 32768.0 | grad norm: 120125.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3355/ 159576 | consumed samples: 60112 | elapsed time per iteration (ms): 14609.6 | learning rate: 1.666E-05 | global batch size: 32 | lm loss: 6.582976E+00 | loss scale: 32768.0 | grad norm: 125934.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3356/ 159576 | consumed samples: 60144 | elapsed time per iteration (ms): 14773.2 | learning rate: 1.666E-05 | global batch size: 32 | lm loss: 6.492831E+00 | loss scale: 32768.0 | grad norm: 114199.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3357/ 159576 | consumed samples: 60176 | elapsed time per iteration (ms): 14529.3 | learning rate: 1.667E-05 | global batch size: 32 | lm loss: 6.348350E+00 | loss scale: 32768.0 | grad norm: 224039.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3358/ 159576 | consumed samples: 60208 | elapsed time per iteration (ms): 14555.6 | learning rate: 1.668E-05 | global batch size: 32 | lm loss: 6.556470E+00 | loss scale: 32768.0 | grad norm: 104992.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3359/ 159576 | consumed samples: 60240 | elapsed time per iteration (ms): 14550.6 | learning rate: 1.669E-05 | global batch size: 32 | lm loss: 6.499870E+00 | loss scale: 32768.0 | grad norm: 135382.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3360/ 159576 | consumed samples: 60272 | elapsed time per iteration (ms): 14838.2 | learning rate: 1.670E-05 | global batch size: 32 | lm loss: 6.482747E+00 | loss scale: 32768.0 | grad norm: 128815.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3361/ 159576 | consumed samples: 60304 | elapsed time per iteration (ms): 14577.3 | learning rate: 1.671E-05 | global batch size: 32 | lm loss: 6.564407E+00 | loss scale: 32768.0 | grad norm: 220163.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3362/ 159576 | consumed samples: 60336 | elapsed time per iteration (ms): 14600.9 | learning rate: 1.672E-05 | global batch size: 32 | lm loss: 6.561186E+00 | loss scale: 32768.0 | grad norm: 110111.851 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3363/ 159576 | consumed samples: 60368 | elapsed time per iteration (ms): 14665.2 | learning rate: 1.673E-05 | global batch size: 32 | lm loss: 6.624823E+00 | loss scale: 32768.0 | grad norm: 119091.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3364/ 159576 | consumed samples: 60400 | elapsed time per iteration (ms): 14799.6 | learning rate: 1.674E-05 | global batch size: 32 | lm loss: 6.572470E+00 | loss scale: 32768.0 | grad norm: 157986.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3365/ 159576 | consumed samples: 60432 | elapsed time per iteration (ms): 14663.0 | learning rate: 1.674E-05 | global batch size: 32 | lm loss: 6.613792E+00 | loss scale: 32768.0 | grad norm: 103982.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3366/ 159576 | consumed samples: 60464 | elapsed time per iteration (ms): 14481.2 | learning rate: 1.675E-05 | global batch size: 32 | lm loss: 6.387408E+00 | loss scale: 32768.0 | grad norm: 158220.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3367/ 159576 | consumed samples: 60496 | elapsed time per iteration (ms): 14521.1 | learning rate: 1.676E-05 | global batch size: 32 | lm loss: 6.515392E+00 | loss scale: 32768.0 | grad norm: 123622.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3368/ 159576 | consumed samples: 60528 | elapsed time per iteration (ms): 15053.7 | learning rate: 1.677E-05 | global batch size: 32 | lm loss: 6.568096E+00 | loss scale: 32768.0 | grad norm: 255456.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3369/ 159576 | consumed samples: 60560 | elapsed time per iteration (ms): 14696.0 | learning rate: 1.678E-05 | global batch size: 32 | lm loss: 6.553046E+00 | loss scale: 32768.0 | grad norm: 144928.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3370/ 159576 | consumed samples: 60592 | elapsed time per iteration (ms): 14594.8 | learning rate: 1.679E-05 | global batch size: 32 | lm loss: 6.341058E+00 | loss scale: 32768.0 | grad norm: 190527.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3371/ 159576 | consumed samples: 60624 | elapsed time per iteration (ms): 14611.4 | learning rate: 1.680E-05 | global batch size: 32 | lm loss: 6.406933E+00 | loss scale: 32768.0 | grad norm: 164464.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3372/ 159576 | consumed samples: 60656 | elapsed time per iteration (ms): 14997.7 | learning rate: 1.681E-05 | global batch size: 32 | lm loss: 6.472693E+00 | loss scale: 32768.0 | grad norm: 140499.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3373/ 159576 | consumed samples: 60688 | elapsed time per iteration (ms): 14555.5 | learning rate: 1.682E-05 | global batch size: 32 | lm loss: 6.472823E+00 | loss scale: 32768.0 | grad norm: 209200.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3374/ 159576 | consumed samples: 60720 | elapsed time per iteration (ms): 14538.5 | learning rate: 1.682E-05 | global batch size: 32 | lm loss: 6.575472E+00 | loss scale: 32768.0 | grad norm: 152311.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3375/ 159576 | consumed samples: 60752 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.683E-05 | global batch size: 32 | lm loss: 6.559402E+00 | loss scale: 32768.0 | grad norm: 139207.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3376/ 159576 | consumed samples: 60784 | elapsed time per iteration (ms): 14908.5 | learning rate: 1.684E-05 | global batch size: 32 | lm loss: 6.450352E+00 | loss scale: 32768.0 | grad norm: 132808.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3377/ 159576 | consumed samples: 60816 | elapsed time per iteration (ms): 14576.3 | learning rate: 1.685E-05 | global batch size: 32 | lm loss: 6.365215E+00 | loss scale: 32768.0 | grad norm: 176292.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3378/ 159576 | consumed samples: 60848 | elapsed time per iteration (ms): 14602.1 | learning rate: 1.686E-05 | global batch size: 32 | lm loss: 6.443403E+00 | loss scale: 32768.0 | grad norm: 123052.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3379/ 159576 | consumed samples: 60880 | elapsed time per iteration (ms): 14651.7 | learning rate: 1.687E-05 | global batch size: 32 | lm loss: 6.502498E+00 | loss scale: 32768.0 | grad norm: 100381.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3380/ 159576 | consumed samples: 60912 | elapsed time per iteration (ms): 14854.4 | learning rate: 1.688E-05 | global batch size: 32 | lm loss: 6.296595E+00 | loss scale: 32768.0 | grad norm: 110161.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3381/ 159576 | consumed samples: 60944 | elapsed time per iteration (ms): 14541.8 | learning rate: 1.689E-05 | global batch size: 32 | lm loss: 6.563570E+00 | loss scale: 32768.0 | grad norm: 88591.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3382/ 159576 | consumed samples: 60976 | elapsed time per iteration (ms): 14608.6 | learning rate: 1.689E-05 | global batch size: 32 | lm loss: 6.582268E+00 | loss scale: 32768.0 | grad norm: 114214.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3383/ 159576 | consumed samples: 61008 | elapsed time per iteration (ms): 14527.6 | learning rate: 1.690E-05 | global batch size: 32 | lm loss: 6.577205E+00 | loss scale: 32768.0 | grad norm: 122437.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3384/ 159576 | consumed samples: 61040 | elapsed time per iteration (ms): 14914.6 | learning rate: 1.691E-05 | global batch size: 32 | lm loss: 6.428950E+00 | loss scale: 32768.0 | grad norm: 125848.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3385/ 159576 | consumed samples: 61072 | elapsed time per iteration (ms): 14662.1 | learning rate: 1.692E-05 | global batch size: 32 | lm loss: 6.677817E+00 | loss scale: 32768.0 | grad norm: 110496.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3386/ 159576 | consumed samples: 61104 | elapsed time per iteration (ms): 14566.3 | learning rate: 1.693E-05 | global batch size: 32 | lm loss: 6.704777E+00 | loss scale: 32768.0 | grad norm: 128540.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3387/ 159576 | consumed samples: 61136 | elapsed time per iteration (ms): 14563.5 | learning rate: 1.694E-05 | global batch size: 32 | lm loss: 6.578674E+00 | loss scale: 32768.0 | grad norm: 143780.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3388/ 159576 | consumed samples: 61168 | elapsed time per iteration (ms): 14890.7 | learning rate: 1.695E-05 | global batch size: 32 | lm loss: 6.503931E+00 | loss scale: 32768.0 | grad norm: 144574.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3389/ 159576 | consumed samples: 61200 | elapsed time per iteration (ms): 14672.5 | learning rate: 1.696E-05 | global batch size: 32 | lm loss: 6.662019E+00 | loss scale: 32768.0 | grad norm: 158358.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3390/ 159576 | consumed samples: 61232 | elapsed time per iteration (ms): 14563.8 | learning rate: 1.697E-05 | global batch size: 32 | lm loss: 6.577336E+00 | loss scale: 32768.0 | grad norm: 198110.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3391/ 159576 | consumed samples: 61264 | elapsed time per iteration (ms): 14556.6 | learning rate: 1.697E-05 | global batch size: 32 | lm loss: 6.480102E+00 | loss scale: 32768.0 | grad norm: 131120.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3392/ 159576 | consumed samples: 61296 | elapsed time per iteration (ms): 14679.5 | learning rate: 1.698E-05 | global batch size: 32 | lm loss: 6.610832E+00 | loss scale: 32768.0 | grad norm: 164581.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3393/ 159576 | consumed samples: 61328 | elapsed time per iteration (ms): 14940.6 | learning rate: 1.699E-05 | global batch size: 32 | lm loss: 6.591301E+00 | loss scale: 32768.0 | grad norm: 109544.075 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3394/ 159576 | consumed samples: 61360 | elapsed time per iteration (ms): 14592.5 | learning rate: 1.700E-05 | global batch size: 32 | lm loss: 6.572402E+00 | loss scale: 32768.0 | grad norm: 121937.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3395/ 159576 | consumed samples: 61392 | elapsed time per iteration (ms): 14696.4 | learning rate: 1.701E-05 | global batch size: 32 | lm loss: 6.509333E+00 | loss scale: 32768.0 | grad norm: 125128.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3396/ 159576 | consumed samples: 61424 | elapsed time per iteration (ms): 14508.0 | learning rate: 1.702E-05 | global batch size: 32 | lm loss: 6.481079E+00 | loss scale: 32768.0 | grad norm: 111910.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3397/ 159576 | consumed samples: 61456 | elapsed time per iteration (ms): 14790.4 | learning rate: 1.703E-05 | global batch size: 32 | lm loss: 6.548109E+00 | loss scale: 32768.0 | grad norm: 98717.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3398/ 159576 | consumed samples: 61488 | elapsed time per iteration (ms): 14622.0 | learning rate: 1.704E-05 | global batch size: 32 | lm loss: 6.769459E+00 | loss scale: 32768.0 | grad norm: 117754.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3399/ 159576 | consumed samples: 61520 | elapsed time per iteration (ms): 14611.9 | learning rate: 1.705E-05 | global batch size: 32 | lm loss: 6.555518E+00 | loss scale: 32768.0 | grad norm: 122435.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3400/ 159576 | consumed samples: 61552 | elapsed time per iteration (ms): 14673.6 | learning rate: 1.705E-05 | global batch size: 32 | lm loss: 6.464739E+00 | loss scale: 32768.0 | grad norm: 119112.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3401/ 159576 | consumed samples: 61584 | elapsed time per iteration (ms): 14910.7 | learning rate: 1.706E-05 | global batch size: 32 | lm loss: 6.473111E+00 | loss scale: 32768.0 | grad norm: 113410.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3402/ 159576 | consumed samples: 61616 | elapsed time per iteration (ms): 14645.2 | learning rate: 1.707E-05 | global batch size: 32 | lm loss: 6.476302E+00 | loss scale: 32768.0 | grad norm: 113730.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3403/ 159576 | consumed samples: 61648 | elapsed time per iteration (ms): 14580.6 | learning rate: 1.708E-05 | global batch size: 32 | lm loss: 6.449226E+00 | loss scale: 32768.0 | grad norm: 82819.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3404/ 159576 | consumed samples: 61680 | elapsed time per iteration (ms): 14600.7 | learning rate: 1.709E-05 | global batch size: 32 | lm loss: 6.560233E+00 | loss scale: 32768.0 | grad norm: 134696.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3405/ 159576 | consumed samples: 61712 | elapsed time per iteration (ms): 14772.7 | learning rate: 1.710E-05 | global batch size: 32 | lm loss: 6.546908E+00 | loss scale: 32768.0 | grad norm: 101163.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3406/ 159576 | consumed samples: 61744 | elapsed time per iteration (ms): 14593.3 | learning rate: 1.711E-05 | global batch size: 32 | lm loss: 6.541033E+00 | loss scale: 32768.0 | grad norm: 109699.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3407/ 159576 | consumed samples: 61776 | elapsed time per iteration (ms): 14624.0 | learning rate: 1.712E-05 | global batch size: 32 | lm loss: 6.511957E+00 | loss scale: 32768.0 | grad norm: 91123.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3408/ 159576 | consumed samples: 61808 | elapsed time per iteration (ms): 14724.5 | learning rate: 1.713E-05 | global batch size: 32 | lm loss: 6.628172E+00 | loss scale: 32768.0 | grad norm: 121584.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3409/ 159576 | consumed samples: 61840 | elapsed time per iteration (ms): 15120.6 | learning rate: 1.713E-05 | global batch size: 32 | lm loss: 6.578444E+00 | loss scale: 32768.0 | grad norm: 116757.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3410/ 159576 | consumed samples: 61872 | elapsed time per iteration (ms): 14619.5 | learning rate: 1.714E-05 | global batch size: 32 | lm loss: 6.415488E+00 | loss scale: 32768.0 | grad norm: 105815.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3411/ 159576 | consumed samples: 61904 | elapsed time per iteration (ms): 14577.8 | learning rate: 1.715E-05 | global batch size: 32 | lm loss: 6.553544E+00 | loss scale: 32768.0 | grad norm: 104053.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3412/ 159576 | consumed samples: 61936 | elapsed time per iteration (ms): 14587.5 | learning rate: 1.716E-05 | global batch size: 32 | lm loss: 6.435183E+00 | loss scale: 32768.0 | grad norm: 101905.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3413/ 159576 | consumed samples: 61968 | elapsed time per iteration (ms): 14985.9 | learning rate: 1.717E-05 | global batch size: 32 | lm loss: 6.580218E+00 | loss scale: 32768.0 | grad norm: 142325.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3414/ 159576 | consumed samples: 62000 | elapsed time per iteration (ms): 14646.8 | learning rate: 1.718E-05 | global batch size: 32 | lm loss: 6.534802E+00 | loss scale: 32768.0 | grad norm: 109771.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3415/ 159576 | consumed samples: 62032 | elapsed time per iteration (ms): 14644.6 | learning rate: 1.719E-05 | global batch size: 32 | lm loss: 6.582119E+00 | loss scale: 32768.0 | grad norm: 192056.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3416/ 159576 | consumed samples: 62064 | elapsed time per iteration (ms): 14616.1 | learning rate: 1.720E-05 | global batch size: 32 | lm loss: 6.496407E+00 | loss scale: 32768.0 | grad norm: 118953.837 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3417/ 159576 | consumed samples: 62096 | elapsed time per iteration (ms): 15113.2 | learning rate: 1.721E-05 | global batch size: 32 | lm loss: 6.475505E+00 | loss scale: 32768.0 | grad norm: 173828.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3418/ 159576 | consumed samples: 62128 | elapsed time per iteration (ms): 14635.6 | learning rate: 1.721E-05 | global batch size: 32 | lm loss: 6.318462E+00 | loss scale: 32768.0 | grad norm: 147925.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3419/ 159576 | consumed samples: 62160 | elapsed time per iteration (ms): 14611.3 | learning rate: 1.722E-05 | global batch size: 32 | lm loss: 6.571759E+00 | loss scale: 32768.0 | grad norm: 112885.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3420/ 159576 | consumed samples: 62192 | elapsed time per iteration (ms): 14573.5 | learning rate: 1.723E-05 | global batch size: 32 | lm loss: 6.461047E+00 | loss scale: 32768.0 | grad norm: 135373.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3421/ 159576 | consumed samples: 62224 | elapsed time per iteration (ms): 14978.7 | learning rate: 1.724E-05 | global batch size: 32 | lm loss: 6.554849E+00 | loss scale: 32768.0 | grad norm: 162048.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3422/ 159576 | consumed samples: 62256 | elapsed time per iteration (ms): 14574.6 | learning rate: 1.725E-05 | global batch size: 32 | lm loss: 6.443440E+00 | loss scale: 32768.0 | grad norm: 103393.805 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3423/ 159576 | consumed samples: 62288 | elapsed time per iteration (ms): 14578.8 | learning rate: 1.726E-05 | global batch size: 32 | lm loss: 6.490220E+00 | loss scale: 32768.0 | grad norm: 217891.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3424/ 159576 | consumed samples: 62320 | elapsed time per iteration (ms): 14669.3 | learning rate: 1.727E-05 | global batch size: 32 | lm loss: 6.475744E+00 | loss scale: 32768.0 | grad norm: 132019.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3425/ 159576 | consumed samples: 62352 | elapsed time per iteration (ms): 15003.7 | learning rate: 1.728E-05 | global batch size: 32 | lm loss: 6.639316E+00 | loss scale: 32768.0 | grad norm: 118549.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3426/ 159576 | consumed samples: 62384 | elapsed time per iteration (ms): 14473.5 | learning rate: 1.729E-05 | global batch size: 32 | lm loss: 6.529860E+00 | loss scale: 32768.0 | grad norm: 110134.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3427/ 159576 | consumed samples: 62416 | elapsed time per iteration (ms): 14593.0 | learning rate: 1.729E-05 | global batch size: 32 | lm loss: 6.424025E+00 | loss scale: 32768.0 | grad norm: 96948.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3428/ 159576 | consumed samples: 62448 | elapsed time per iteration (ms): 14574.8 | learning rate: 1.730E-05 | global batch size: 32 | lm loss: 6.603945E+00 | loss scale: 32768.0 | grad norm: 108813.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3429/ 159576 | consumed samples: 62480 | elapsed time per iteration (ms): 14962.4 | learning rate: 1.731E-05 | global batch size: 32 | lm loss: 6.519920E+00 | loss scale: 32768.0 | grad norm: 120997.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3430/ 159576 | consumed samples: 62512 | elapsed time per iteration (ms): 14606.5 | learning rate: 1.732E-05 | global batch size: 32 | lm loss: 6.519583E+00 | loss scale: 32768.0 | grad norm: 102226.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3431/ 159576 | consumed samples: 62544 | elapsed time per iteration (ms): 14685.5 | learning rate: 1.733E-05 | global batch size: 32 | lm loss: 6.413152E+00 | loss scale: 32768.0 | grad norm: 146442.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3432/ 159576 | consumed samples: 62576 | elapsed time per iteration (ms): 14642.7 | learning rate: 1.734E-05 | global batch size: 32 | lm loss: 6.416885E+00 | loss scale: 32768.0 | grad norm: 106692.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3433/ 159576 | consumed samples: 62608 | elapsed time per iteration (ms): 14943.4 | learning rate: 1.735E-05 | global batch size: 32 | lm loss: 6.684166E+00 | loss scale: 32768.0 | grad norm: 122647.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3434/ 159576 | consumed samples: 62640 | elapsed time per iteration (ms): 14559.8 | learning rate: 1.736E-05 | global batch size: 32 | lm loss: 6.582661E+00 | loss scale: 32768.0 | grad norm: 143037.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3435/ 159576 | consumed samples: 62672 | elapsed time per iteration (ms): 14581.0 | learning rate: 1.737E-05 | global batch size: 32 | lm loss: 6.459047E+00 | loss scale: 32768.0 | grad norm: 139754.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3436/ 159576 | consumed samples: 62704 | elapsed time per iteration (ms): 14594.3 | learning rate: 1.737E-05 | global batch size: 32 | lm loss: 6.455495E+00 | loss scale: 32768.0 | grad norm: 199133.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3437/ 159576 | consumed samples: 62736 | elapsed time per iteration (ms): 14983.6 | learning rate: 1.738E-05 | global batch size: 32 | lm loss: 6.507184E+00 | loss scale: 32768.0 | grad norm: 193681.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3438/ 159576 | consumed samples: 62768 | elapsed time per iteration (ms): 14797.2 | learning rate: 1.739E-05 | global batch size: 32 | lm loss: 6.461359E+00 | loss scale: 32768.0 | grad norm: 132732.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3439/ 159576 | consumed samples: 62800 | elapsed time per iteration (ms): 14579.8 | learning rate: 1.740E-05 | global batch size: 32 | lm loss: 6.704415E+00 | loss scale: 32768.0 | grad norm: 113391.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3440/ 159576 | consumed samples: 62832 | elapsed time per iteration (ms): 14621.6 | learning rate: 1.741E-05 | global batch size: 32 | lm loss: 6.473897E+00 | loss scale: 32768.0 | grad norm: 120849.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3441/ 159576 | consumed samples: 62864 | elapsed time per iteration (ms): 14686.1 | learning rate: 1.742E-05 | global batch size: 32 | lm loss: 6.459955E+00 | loss scale: 32768.0 | grad norm: 128216.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3442/ 159576 | consumed samples: 62896 | elapsed time per iteration (ms): 14857.9 | learning rate: 1.743E-05 | global batch size: 32 | lm loss: 6.424060E+00 | loss scale: 32768.0 | grad norm: 102672.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3443/ 159576 | consumed samples: 62928 | elapsed time per iteration (ms): 14570.1 | learning rate: 1.744E-05 | global batch size: 32 | lm loss: 6.534360E+00 | loss scale: 32768.0 | grad norm: 184877.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3444/ 159576 | consumed samples: 62960 | elapsed time per iteration (ms): 14620.2 | learning rate: 1.745E-05 | global batch size: 32 | lm loss: 6.629717E+00 | loss scale: 32768.0 | grad norm: 138408.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3445/ 159576 | consumed samples: 62992 | elapsed time per iteration (ms): 14619.1 | learning rate: 1.745E-05 | global batch size: 32 | lm loss: 6.494986E+00 | loss scale: 32768.0 | grad norm: 131634.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3446/ 159576 | consumed samples: 63024 | elapsed time per iteration (ms): 14739.8 | learning rate: 1.746E-05 | global batch size: 32 | lm loss: 6.529834E+00 | loss scale: 32768.0 | grad norm: 190204.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3447/ 159576 | consumed samples: 63056 | elapsed time per iteration (ms): 14575.9 | learning rate: 1.747E-05 | global batch size: 32 | lm loss: 6.519164E+00 | loss scale: 32768.0 | grad norm: 190893.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3448/ 159576 | consumed samples: 63088 | elapsed time per iteration (ms): 14611.0 | learning rate: 1.748E-05 | global batch size: 32 | lm loss: 6.431557E+00 | loss scale: 32768.0 | grad norm: 127326.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3449/ 159576 | consumed samples: 63120 | elapsed time per iteration (ms): 14615.1 | learning rate: 1.749E-05 | global batch size: 32 | lm loss: 6.213955E+00 | loss scale: 32768.0 | grad norm: 149485.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3450/ 159576 | consumed samples: 63152 | elapsed time per iteration (ms): 14697.2 | learning rate: 1.750E-05 | global batch size: 32 | lm loss: 6.669972E+00 | loss scale: 32768.0 | grad norm: 121418.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3451/ 159576 | consumed samples: 63184 | elapsed time per iteration (ms): 14506.2 | learning rate: 1.751E-05 | global batch size: 32 | lm loss: 6.538607E+00 | loss scale: 32768.0 | grad norm: 160228.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3452/ 159576 | consumed samples: 63216 | elapsed time per iteration (ms): 14518.4 | learning rate: 1.752E-05 | global batch size: 32 | lm loss: 6.466623E+00 | loss scale: 32768.0 | grad norm: 132558.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3453/ 159576 | consumed samples: 63248 | elapsed time per iteration (ms): 14654.4 | learning rate: 1.753E-05 | global batch size: 32 | lm loss: 6.575057E+00 | loss scale: 32768.0 | grad norm: 126715.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3454/ 159576 | consumed samples: 63280 | elapsed time per iteration (ms): 14975.6 | learning rate: 1.753E-05 | global batch size: 32 | lm loss: 6.469002E+00 | loss scale: 32768.0 | grad norm: 134315.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3455/ 159576 | consumed samples: 63312 | elapsed time per iteration (ms): 14595.3 | learning rate: 1.754E-05 | global batch size: 32 | lm loss: 6.471159E+00 | loss scale: 32768.0 | grad norm: 132183.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3456/ 159576 | consumed samples: 63344 | elapsed time per iteration (ms): 14624.6 | learning rate: 1.755E-05 | global batch size: 32 | lm loss: 6.390759E+00 | loss scale: 32768.0 | grad norm: 168993.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3457/ 159576 | consumed samples: 63376 | elapsed time per iteration (ms): 14611.9 | learning rate: 1.756E-05 | global batch size: 32 | lm loss: 6.545074E+00 | loss scale: 32768.0 | grad norm: 116907.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3458/ 159576 | consumed samples: 63408 | elapsed time per iteration (ms): 14991.7 | learning rate: 1.757E-05 | global batch size: 32 | lm loss: 6.541002E+00 | loss scale: 32768.0 | grad norm: 144421.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3459/ 159576 | consumed samples: 63440 | elapsed time per iteration (ms): 14690.5 | learning rate: 1.758E-05 | global batch size: 32 | lm loss: 6.549660E+00 | loss scale: 32768.0 | grad norm: 177618.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3460/ 159576 | consumed samples: 63472 | elapsed time per iteration (ms): 14572.5 | learning rate: 1.759E-05 | global batch size: 32 | lm loss: 6.509130E+00 | loss scale: 32768.0 | grad norm: 102216.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3461/ 159576 | consumed samples: 63504 | elapsed time per iteration (ms): 14630.9 | learning rate: 1.760E-05 | global batch size: 32 | lm loss: 6.474805E+00 | loss scale: 32768.0 | grad norm: 198903.879 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3462/ 159576 | consumed samples: 63536 | elapsed time per iteration (ms): 14903.4 | learning rate: 1.761E-05 | global batch size: 32 | lm loss: 6.343786E+00 | loss scale: 32768.0 | grad norm: 142714.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3463/ 159576 | consumed samples: 63568 | elapsed time per iteration (ms): 14638.9 | learning rate: 1.761E-05 | global batch size: 32 | lm loss: 6.644784E+00 | loss scale: 32768.0 | grad norm: 158591.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3464/ 159576 | consumed samples: 63600 | elapsed time per iteration (ms): 14613.0 | learning rate: 1.762E-05 | global batch size: 32 | lm loss: 6.625895E+00 | loss scale: 32768.0 | grad norm: 123320.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3465/ 159576 | consumed samples: 63632 | elapsed time per iteration (ms): 14585.1 | learning rate: 1.763E-05 | global batch size: 32 | lm loss: 6.575481E+00 | loss scale: 32768.0 | grad norm: 175492.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3466/ 159576 | consumed samples: 63664 | elapsed time per iteration (ms): 15007.9 | learning rate: 1.764E-05 | global batch size: 32 | lm loss: 6.510527E+00 | loss scale: 32768.0 | grad norm: 141462.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3467/ 159576 | consumed samples: 63696 | elapsed time per iteration (ms): 14658.4 | learning rate: 1.765E-05 | global batch size: 32 | lm loss: 6.281921E+00 | loss scale: 32768.0 | grad norm: 133404.006 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3468/ 159576 | consumed samples: 63728 | elapsed time per iteration (ms): 14580.1 | learning rate: 1.766E-05 | global batch size: 32 | lm loss: 6.438425E+00 | loss scale: 32768.0 | grad norm: 155340.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3469/ 159576 | consumed samples: 63760 | elapsed time per iteration (ms): 14575.6 | learning rate: 1.767E-05 | global batch size: 32 | lm loss: 6.527649E+00 | loss scale: 32768.0 | grad norm: 99587.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3470/ 159576 | consumed samples: 63792 | elapsed time per iteration (ms): 14895.6 | learning rate: 1.768E-05 | global batch size: 32 | lm loss: 6.196751E+00 | loss scale: 32768.0 | grad norm: 208702.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3471/ 159576 | consumed samples: 63824 | elapsed time per iteration (ms): 14601.7 | learning rate: 1.768E-05 | global batch size: 32 | lm loss: 6.487125E+00 | loss scale: 32768.0 | grad norm: 168900.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3472/ 159576 | consumed samples: 63856 | elapsed time per iteration (ms): 14566.0 | learning rate: 1.769E-05 | global batch size: 32 | lm loss: 6.509688E+00 | loss scale: 32768.0 | grad norm: 154921.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3473/ 159576 | consumed samples: 63888 | elapsed time per iteration (ms): 14575.1 | learning rate: 1.770E-05 | global batch size: 32 | lm loss: 6.622843E+00 | loss scale: 32768.0 | grad norm: 140472.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3474/ 159576 | consumed samples: 63920 | elapsed time per iteration (ms): 14877.5 | learning rate: 1.771E-05 | global batch size: 32 | lm loss: 6.475362E+00 | loss scale: 32768.0 | grad norm: 119718.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3475/ 159576 | consumed samples: 63952 | elapsed time per iteration (ms): 14552.0 | learning rate: 1.772E-05 | global batch size: 32 | lm loss: 6.465285E+00 | loss scale: 32768.0 | grad norm: 172671.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3476/ 159576 | consumed samples: 63984 | elapsed time per iteration (ms): 14582.7 | learning rate: 1.773E-05 | global batch size: 32 | lm loss: 6.389154E+00 | loss scale: 32768.0 | grad norm: 113417.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3477/ 159576 | consumed samples: 64016 | elapsed time per iteration (ms): 14606.6 | learning rate: 1.774E-05 | global batch size: 32 | lm loss: 6.582153E+00 | loss scale: 32768.0 | grad norm: 139244.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3478/ 159576 | consumed samples: 64048 | elapsed time per iteration (ms): 14915.2 | learning rate: 1.775E-05 | global batch size: 32 | lm loss: 6.490180E+00 | loss scale: 32768.0 | grad norm: 94281.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3479/ 159576 | consumed samples: 64080 | elapsed time per iteration (ms): 14555.1 | learning rate: 1.776E-05 | global batch size: 32 | lm loss: 6.683810E+00 | loss scale: 32768.0 | grad norm: 149137.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3480/ 159576 | consumed samples: 64112 | elapsed time per iteration (ms): 14553.1 | learning rate: 1.776E-05 | global batch size: 32 | lm loss: 6.534214E+00 | loss scale: 32768.0 | grad norm: 129169.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3481/ 159576 | consumed samples: 64144 | elapsed time per iteration (ms): 14603.3 | learning rate: 1.777E-05 | global batch size: 32 | lm loss: 6.581446E+00 | loss scale: 32768.0 | grad norm: 115991.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3482/ 159576 | consumed samples: 64176 | elapsed time per iteration (ms): 14916.9 | learning rate: 1.778E-05 | global batch size: 32 | lm loss: 6.567008E+00 | loss scale: 32768.0 | grad norm: 184960.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3483/ 159576 | consumed samples: 64208 | elapsed time per iteration (ms): 14481.2 | learning rate: 1.779E-05 | global batch size: 32 | lm loss: 6.662760E+00 | loss scale: 32768.0 | grad norm: 134077.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3484/ 159576 | consumed samples: 64240 | elapsed time per iteration (ms): 14567.5 | learning rate: 1.780E-05 | global batch size: 32 | lm loss: 6.589795E+00 | loss scale: 32768.0 | grad norm: 126611.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3485/ 159576 | consumed samples: 64272 | elapsed time per iteration (ms): 14495.3 | learning rate: 1.781E-05 | global batch size: 32 | lm loss: 6.497936E+00 | loss scale: 32768.0 | grad norm: 122115.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3486/ 159576 | consumed samples: 64304 | elapsed time per iteration (ms): 14568.8 | learning rate: 1.782E-05 | global batch size: 32 | lm loss: 6.558665E+00 | loss scale: 32768.0 | grad norm: 126373.837 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3487/ 159576 | consumed samples: 64336 | elapsed time per iteration (ms): 14913.4 | learning rate: 1.783E-05 | global batch size: 32 | lm loss: 6.431637E+00 | loss scale: 32768.0 | grad norm: 161636.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3488/ 159576 | consumed samples: 64368 | elapsed time per iteration (ms): 14528.7 | learning rate: 1.784E-05 | global batch size: 32 | lm loss: 6.356628E+00 | loss scale: 32768.0 | grad norm: 114700.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3489/ 159576 | consumed samples: 64400 | elapsed time per iteration (ms): 14522.5 | learning rate: 1.784E-05 | global batch size: 32 | lm loss: 6.470509E+00 | loss scale: 32768.0 | grad norm: 157358.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3490/ 159576 | consumed samples: 64432 | elapsed time per iteration (ms): 14512.2 | learning rate: 1.785E-05 | global batch size: 32 | lm loss: 6.580731E+00 | loss scale: 32768.0 | grad norm: 124839.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3491/ 159576 | consumed samples: 64464 | elapsed time per iteration (ms): 14760.8 | learning rate: 1.786E-05 | global batch size: 32 | lm loss: 6.545910E+00 | loss scale: 32768.0 | grad norm: 225734.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3492/ 159576 | consumed samples: 64496 | elapsed time per iteration (ms): 14465.1 | learning rate: 1.787E-05 | global batch size: 32 | lm loss: 6.462240E+00 | loss scale: 32768.0 | grad norm: 157153.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3493/ 159576 | consumed samples: 64528 | elapsed time per iteration (ms): 14555.7 | learning rate: 1.788E-05 | global batch size: 32 | lm loss: 6.526244E+00 | loss scale: 32768.0 | grad norm: 134834.105 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3494/ 159576 | consumed samples: 64560 | elapsed time per iteration (ms): 14523.5 | learning rate: 1.789E-05 | global batch size: 32 | lm loss: 6.464767E+00 | loss scale: 32768.0 | grad norm: 111080.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3495/ 159576 | consumed samples: 64592 | elapsed time per iteration (ms): 14680.5 | learning rate: 1.790E-05 | global batch size: 32 | lm loss: 6.498696E+00 | loss scale: 32768.0 | grad norm: 149926.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3496/ 159576 | consumed samples: 64624 | elapsed time per iteration (ms): 14537.6 | learning rate: 1.791E-05 | global batch size: 32 | lm loss: 6.801207E+00 | loss scale: 32768.0 | grad norm: 169978.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3497/ 159576 | consumed samples: 64656 | elapsed time per iteration (ms): 14576.8 | learning rate: 1.792E-05 | global batch size: 32 | lm loss: 6.458578E+00 | loss scale: 32768.0 | grad norm: 128624.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3498/ 159576 | consumed samples: 64688 | elapsed time per iteration (ms): 14451.0 | learning rate: 1.792E-05 | global batch size: 32 | lm loss: 6.562904E+00 | loss scale: 32768.0 | grad norm: 201818.910 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3499/ 159576 | consumed samples: 64720 | elapsed time per iteration (ms): 14843.4 | learning rate: 1.793E-05 | global batch size: 32 | lm loss: 6.620703E+00 | loss scale: 32768.0 | grad norm: 136369.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3500/ 159576 | consumed samples: 64752 | elapsed time per iteration (ms): 14591.5 | learning rate: 1.794E-05 | global batch size: 32 | lm loss: 6.545550E+00 | loss scale: 32768.0 | grad norm: 169642.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3501/ 159576 | consumed samples: 64784 | elapsed time per iteration (ms): 14557.9 | learning rate: 1.795E-05 | global batch size: 32 | lm loss: 6.401666E+00 | loss scale: 32768.0 | grad norm: 152333.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3502/ 159576 | consumed samples: 64816 | elapsed time per iteration (ms): 14554.3 | learning rate: 1.796E-05 | global batch size: 32 | lm loss: 6.776519E+00 | loss scale: 32768.0 | grad norm: 234394.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3503/ 159576 | consumed samples: 64848 | elapsed time per iteration (ms): 14868.0 | learning rate: 1.797E-05 | global batch size: 32 | lm loss: 6.465873E+00 | loss scale: 32768.0 | grad norm: 117665.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3504/ 159576 | consumed samples: 64880 | elapsed time per iteration (ms): 14552.4 | learning rate: 1.798E-05 | global batch size: 32 | lm loss: 6.534934E+00 | loss scale: 32768.0 | grad norm: 205418.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3505/ 159576 | consumed samples: 64912 | elapsed time per iteration (ms): 14532.4 | learning rate: 1.799E-05 | global batch size: 32 | lm loss: 6.777419E+00 | loss scale: 32768.0 | grad norm: 156642.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3506/ 159576 | consumed samples: 64944 | elapsed time per iteration (ms): 14549.9 | learning rate: 1.800E-05 | global batch size: 32 | lm loss: 6.528007E+00 | loss scale: 32768.0 | grad norm: 168324.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3507/ 159576 | consumed samples: 64976 | elapsed time per iteration (ms): 14947.6 | learning rate: 1.800E-05 | global batch size: 32 | lm loss: 6.669527E+00 | loss scale: 32768.0 | grad norm: 116164.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3508/ 159576 | consumed samples: 65008 | elapsed time per iteration (ms): 14485.1 | learning rate: 1.801E-05 | global batch size: 32 | lm loss: 6.649974E+00 | loss scale: 32768.0 | grad norm: 195968.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3509/ 159576 | consumed samples: 65040 | elapsed time per iteration (ms): 14549.4 | learning rate: 1.802E-05 | global batch size: 32 | lm loss: 6.636446E+00 | loss scale: 32768.0 | grad norm: 135969.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3510/ 159576 | consumed samples: 65072 | elapsed time per iteration (ms): 14546.9 | learning rate: 1.803E-05 | global batch size: 32 | lm loss: 6.529005E+00 | loss scale: 32768.0 | grad norm: 225903.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3511/ 159576 | consumed samples: 65104 | elapsed time per iteration (ms): 14847.8 | learning rate: 1.804E-05 | global batch size: 32 | lm loss: 6.629415E+00 | loss scale: 32768.0 | grad norm: 130652.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3512/ 159576 | consumed samples: 65136 | elapsed time per iteration (ms): 14520.0 | learning rate: 1.805E-05 | global batch size: 32 | lm loss: 6.599288E+00 | loss scale: 32768.0 | grad norm: 149863.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3513/ 159576 | consumed samples: 65168 | elapsed time per iteration (ms): 14651.1 | learning rate: 1.806E-05 | global batch size: 32 | lm loss: 6.592654E+00 | loss scale: 32768.0 | grad norm: 166996.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3514/ 159576 | consumed samples: 65200 | elapsed time per iteration (ms): 14479.3 | learning rate: 1.807E-05 | global batch size: 32 | lm loss: 6.540200E+00 | loss scale: 32768.0 | grad norm: 115498.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3515/ 159576 | consumed samples: 65232 | elapsed time per iteration (ms): 14930.0 | learning rate: 1.808E-05 | global batch size: 32 | lm loss: 6.488201E+00 | loss scale: 32768.0 | grad norm: 217689.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3516/ 159576 | consumed samples: 65264 | elapsed time per iteration (ms): 14459.8 | learning rate: 1.808E-05 | global batch size: 32 | lm loss: 6.478746E+00 | loss scale: 32768.0 | grad norm: 131460.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3517/ 159576 | consumed samples: 65296 | elapsed time per iteration (ms): 14524.9 | learning rate: 1.809E-05 | global batch size: 32 | lm loss: 6.658568E+00 | loss scale: 32768.0 | grad norm: 186540.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3518/ 159576 | consumed samples: 65328 | elapsed time per iteration (ms): 14525.2 | learning rate: 1.810E-05 | global batch size: 32 | lm loss: 6.641760E+00 | loss scale: 32768.0 | grad norm: 215453.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3519/ 159576 | consumed samples: 65360 | elapsed time per iteration (ms): 14903.9 | learning rate: 1.811E-05 | global batch size: 32 | lm loss: 6.578794E+00 | loss scale: 32768.0 | grad norm: 129785.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3520/ 159576 | consumed samples: 65392 | elapsed time per iteration (ms): 14710.5 | learning rate: 1.812E-05 | global batch size: 32 | lm loss: 6.623507E+00 | loss scale: 32768.0 | grad norm: 120935.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3521/ 159576 | consumed samples: 65424 | elapsed time per iteration (ms): 14520.7 | learning rate: 1.813E-05 | global batch size: 32 | lm loss: 6.597843E+00 | loss scale: 32768.0 | grad norm: 116244.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3522/ 159576 | consumed samples: 65456 | elapsed time per iteration (ms): 14597.0 | learning rate: 1.814E-05 | global batch size: 32 | lm loss: 6.504926E+00 | loss scale: 32768.0 | grad norm: 134767.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3523/ 159576 | consumed samples: 65488 | elapsed time per iteration (ms): 14942.9 | learning rate: 1.815E-05 | global batch size: 32 | lm loss: 6.435289E+00 | loss scale: 32768.0 | grad norm: 86682.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3524/ 159576 | consumed samples: 65520 | elapsed time per iteration (ms): 14654.2 | learning rate: 1.816E-05 | global batch size: 32 | lm loss: 6.594196E+00 | loss scale: 32768.0 | grad norm: 134027.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3525/ 159576 | consumed samples: 65552 | elapsed time per iteration (ms): 14562.7 | learning rate: 1.816E-05 | global batch size: 32 | lm loss: 6.679243E+00 | loss scale: 32768.0 | grad norm: 125221.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3526/ 159576 | consumed samples: 65584 | elapsed time per iteration (ms): 14630.7 | learning rate: 1.817E-05 | global batch size: 32 | lm loss: 6.456674E+00 | loss scale: 32768.0 | grad norm: 86112.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3527/ 159576 | consumed samples: 65616 | elapsed time per iteration (ms): 14493.8 | learning rate: 1.818E-05 | global batch size: 32 | lm loss: 6.600234E+00 | loss scale: 32768.0 | grad norm: 300729.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3528/ 159576 | consumed samples: 65648 | elapsed time per iteration (ms): 14813.0 | learning rate: 1.819E-05 | global batch size: 32 | lm loss: 6.399897E+00 | loss scale: 32768.0 | grad norm: 153878.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3529/ 159576 | consumed samples: 65680 | elapsed time per iteration (ms): 14593.6 | learning rate: 1.820E-05 | global batch size: 32 | lm loss: 6.540657E+00 | loss scale: 32768.0 | grad norm: 150860.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3530/ 159576 | consumed samples: 65712 | elapsed time per iteration (ms): 14559.8 | learning rate: 1.821E-05 | global batch size: 32 | lm loss: 6.503862E+00 | loss scale: 32768.0 | grad norm: 149193.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3531/ 159576 | consumed samples: 65744 | elapsed time per iteration (ms): 14581.4 | learning rate: 1.822E-05 | global batch size: 32 | lm loss: 6.692787E+00 | loss scale: 32768.0 | grad norm: 207812.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3532/ 159576 | consumed samples: 65776 | elapsed time per iteration (ms): 14715.5 | learning rate: 1.823E-05 | global batch size: 32 | lm loss: 6.484317E+00 | loss scale: 32768.0 | grad norm: 161092.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3533/ 159576 | consumed samples: 65808 | elapsed time per iteration (ms): 14610.9 | learning rate: 1.824E-05 | global batch size: 32 | lm loss: 6.475138E+00 | loss scale: 32768.0 | grad norm: 155421.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3534/ 159576 | consumed samples: 65840 | elapsed time per iteration (ms): 14445.3 | learning rate: 1.824E-05 | global batch size: 32 | lm loss: 6.511703E+00 | loss scale: 32768.0 | grad norm: 114681.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3535/ 159576 | consumed samples: 65872 | elapsed time per iteration (ms): 14477.9 | learning rate: 1.825E-05 | global batch size: 32 | lm loss: 6.509159E+00 | loss scale: 32768.0 | grad norm: 183050.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3536/ 159576 | consumed samples: 65904 | elapsed time per iteration (ms): 14816.2 | learning rate: 1.826E-05 | global batch size: 32 | lm loss: 6.497670E+00 | loss scale: 32768.0 | grad norm: 96091.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3537/ 159576 | consumed samples: 65936 | elapsed time per iteration (ms): 14439.5 | learning rate: 1.827E-05 | global batch size: 32 | lm loss: 6.505747E+00 | loss scale: 32768.0 | grad norm: 140156.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3538/ 159576 | consumed samples: 65968 | elapsed time per iteration (ms): 14594.1 | learning rate: 1.828E-05 | global batch size: 32 | lm loss: 6.516546E+00 | loss scale: 32768.0 | grad norm: 97276.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3539/ 159576 | consumed samples: 66000 | elapsed time per iteration (ms): 14531.0 | learning rate: 1.829E-05 | global batch size: 32 | lm loss: 6.589782E+00 | loss scale: 32768.0 | grad norm: 283362.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3540/ 159576 | consumed samples: 66032 | elapsed time per iteration (ms): 14766.1 | learning rate: 1.830E-05 | global batch size: 32 | lm loss: 6.457118E+00 | loss scale: 32768.0 | grad norm: 119093.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3541/ 159576 | consumed samples: 66064 | elapsed time per iteration (ms): 14538.8 | learning rate: 1.831E-05 | global batch size: 32 | lm loss: 6.543458E+00 | loss scale: 32768.0 | grad norm: 143270.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3542/ 159576 | consumed samples: 66096 | elapsed time per iteration (ms): 14503.8 | learning rate: 1.832E-05 | global batch size: 32 | lm loss: 6.549830E+00 | loss scale: 32768.0 | grad norm: 146934.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3543/ 159576 | consumed samples: 66128 | elapsed time per iteration (ms): 14525.1 | learning rate: 1.832E-05 | global batch size: 32 | lm loss: 6.523373E+00 | loss scale: 32768.0 | grad norm: 246079.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3544/ 159576 | consumed samples: 66160 | elapsed time per iteration (ms): 14836.5 | learning rate: 1.833E-05 | global batch size: 32 | lm loss: 6.484323E+00 | loss scale: 32768.0 | grad norm: 150473.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3545/ 159576 | consumed samples: 66192 | elapsed time per iteration (ms): 14612.1 | learning rate: 1.834E-05 | global batch size: 32 | lm loss: 6.596731E+00 | loss scale: 32768.0 | grad norm: 157995.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3546/ 159576 | consumed samples: 66224 | elapsed time per iteration (ms): 14518.2 | learning rate: 1.835E-05 | global batch size: 32 | lm loss: 6.564546E+00 | loss scale: 32768.0 | grad norm: 164874.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3547/ 159576 | consumed samples: 66256 | elapsed time per iteration (ms): 14501.0 | learning rate: 1.836E-05 | global batch size: 32 | lm loss: 6.427078E+00 | loss scale: 32768.0 | grad norm: 175876.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3548/ 159576 | consumed samples: 66288 | elapsed time per iteration (ms): 14899.9 | learning rate: 1.837E-05 | global batch size: 32 | lm loss: 6.488606E+00 | loss scale: 32768.0 | grad norm: 198886.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3549/ 159576 | consumed samples: 66320 | elapsed time per iteration (ms): 14520.6 | learning rate: 1.838E-05 | global batch size: 32 | lm loss: 6.462682E+00 | loss scale: 32768.0 | grad norm: 127675.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3550/ 159576 | consumed samples: 66352 | elapsed time per iteration (ms): 14447.8 | learning rate: 1.839E-05 | global batch size: 32 | lm loss: 6.652044E+00 | loss scale: 32768.0 | grad norm: 140944.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3551/ 159576 | consumed samples: 66384 | elapsed time per iteration (ms): 14467.2 | learning rate: 1.839E-05 | global batch size: 32 | lm loss: 6.520955E+00 | loss scale: 32768.0 | grad norm: 86094.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3552/ 159576 | consumed samples: 66416 | elapsed time per iteration (ms): 14808.2 | learning rate: 1.840E-05 | global batch size: 32 | lm loss: 6.429432E+00 | loss scale: 32768.0 | grad norm: 116647.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3553/ 159576 | consumed samples: 66448 | elapsed time per iteration (ms): 14503.5 | learning rate: 1.841E-05 | global batch size: 32 | lm loss: 6.463936E+00 | loss scale: 32768.0 | grad norm: 118564.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3554/ 159576 | consumed samples: 66480 | elapsed time per iteration (ms): 14502.1 | learning rate: 1.842E-05 | global batch size: 32 | lm loss: 6.458220E+00 | loss scale: 32768.0 | grad norm: 112013.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3555/ 159576 | consumed samples: 66512 | elapsed time per iteration (ms): 14486.2 | learning rate: 1.843E-05 | global batch size: 32 | lm loss: 6.492205E+00 | loss scale: 32768.0 | grad norm: 95075.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3556/ 159576 | consumed samples: 66544 | elapsed time per iteration (ms): 14873.1 | learning rate: 1.844E-05 | global batch size: 32 | lm loss: 6.582590E+00 | loss scale: 32768.0 | grad norm: 160024.973 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3557/ 159576 | consumed samples: 66576 | elapsed time per iteration (ms): 14487.7 | learning rate: 1.845E-05 | global batch size: 32 | lm loss: 6.504139E+00 | loss scale: 32768.0 | grad norm: 102536.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3558/ 159576 | consumed samples: 66608 | elapsed time per iteration (ms): 14571.2 | learning rate: 1.846E-05 | global batch size: 32 | lm loss: 6.514203E+00 | loss scale: 32768.0 | grad norm: 221229.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3559/ 159576 | consumed samples: 66640 | elapsed time per iteration (ms): 14451.0 | learning rate: 1.847E-05 | global batch size: 32 | lm loss: 6.560319E+00 | loss scale: 32768.0 | grad norm: 131012.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3560/ 159576 | consumed samples: 66672 | elapsed time per iteration (ms): 14938.1 | learning rate: 1.847E-05 | global batch size: 32 | lm loss: 6.372297E+00 | loss scale: 32768.0 | grad norm: 139056.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3561/ 159576 | consumed samples: 66704 | elapsed time per iteration (ms): 14523.1 | learning rate: 1.848E-05 | global batch size: 32 | lm loss: 6.416655E+00 | loss scale: 32768.0 | grad norm: 147497.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3562/ 159576 | consumed samples: 66736 | elapsed time per iteration (ms): 14487.9 | learning rate: 1.849E-05 | global batch size: 32 | lm loss: 6.474949E+00 | loss scale: 32768.0 | grad norm: 174437.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3563/ 159576 | consumed samples: 66768 | elapsed time per iteration (ms): 14468.9 | learning rate: 1.850E-05 | global batch size: 32 | lm loss: 6.623423E+00 | loss scale: 32768.0 | grad norm: 122791.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3564/ 159576 | consumed samples: 66800 | elapsed time per iteration (ms): 14508.1 | learning rate: 1.851E-05 | global batch size: 32 | lm loss: 6.516719E+00 | loss scale: 32768.0 | grad norm: 125896.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3565/ 159576 | consumed samples: 66832 | elapsed time per iteration (ms): 14821.3 | learning rate: 1.852E-05 | global batch size: 32 | lm loss: 6.567136E+00 | loss scale: 32768.0 | grad norm: 156146.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3566/ 159576 | consumed samples: 66864 | elapsed time per iteration (ms): 14550.7 | learning rate: 1.853E-05 | global batch size: 32 | lm loss: 6.464426E+00 | loss scale: 32768.0 | grad norm: 112089.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3567/ 159576 | consumed samples: 66896 | elapsed time per iteration (ms): 14483.3 | learning rate: 1.854E-05 | global batch size: 32 | lm loss: 6.330031E+00 | loss scale: 32768.0 | grad norm: 100672.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3568/ 159576 | consumed samples: 66928 | elapsed time per iteration (ms): 14573.3 | learning rate: 1.855E-05 | global batch size: 32 | lm loss: 6.472744E+00 | loss scale: 32768.0 | grad norm: 206164.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3569/ 159576 | consumed samples: 66960 | elapsed time per iteration (ms): 14778.2 | learning rate: 1.855E-05 | global batch size: 32 | lm loss: 6.502261E+00 | loss scale: 32768.0 | grad norm: 117741.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3570/ 159576 | consumed samples: 66992 | elapsed time per iteration (ms): 14563.8 | learning rate: 1.856E-05 | global batch size: 32 | lm loss: 6.480472E+00 | loss scale: 32768.0 | grad norm: 180667.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3571/ 159576 | consumed samples: 67024 | elapsed time per iteration (ms): 14517.4 | learning rate: 1.857E-05 | global batch size: 32 | lm loss: 6.653479E+00 | loss scale: 32768.0 | grad norm: 121625.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3572/ 159576 | consumed samples: 67056 | elapsed time per iteration (ms): 14532.0 | learning rate: 1.858E-05 | global batch size: 32 | lm loss: 6.478413E+00 | loss scale: 32768.0 | grad norm: 135823.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3573/ 159576 | consumed samples: 67088 | elapsed time per iteration (ms): 14807.4 | learning rate: 1.859E-05 | global batch size: 32 | lm loss: 6.589501E+00 | loss scale: 32768.0 | grad norm: 147763.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3574/ 159576 | consumed samples: 67120 | elapsed time per iteration (ms): 14483.4 | learning rate: 1.860E-05 | global batch size: 32 | lm loss: 6.503617E+00 | loss scale: 32768.0 | grad norm: 85865.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3575/ 159576 | consumed samples: 67152 | elapsed time per iteration (ms): 14505.6 | learning rate: 1.861E-05 | global batch size: 32 | lm loss: 6.573061E+00 | loss scale: 32768.0 | grad norm: 180050.879 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3576/ 159576 | consumed samples: 67184 | elapsed time per iteration (ms): 14550.9 | learning rate: 1.862E-05 | global batch size: 32 | lm loss: 6.480776E+00 | loss scale: 32768.0 | grad norm: 122066.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3577/ 159576 | consumed samples: 67216 | elapsed time per iteration (ms): 14868.6 | learning rate: 1.863E-05 | global batch size: 32 | lm loss: 6.625753E+00 | loss scale: 32768.0 | grad norm: 166062.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3578/ 159576 | consumed samples: 67248 | elapsed time per iteration (ms): 14594.8 | learning rate: 1.863E-05 | global batch size: 32 | lm loss: 6.470201E+00 | loss scale: 32768.0 | grad norm: 158898.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 16:06:53] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 16:06:53] PULSE: tr8-104B is running for 10:14:42 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 3579/ 159576 | consumed samples: 67280 | elapsed time per iteration (ms): 14505.5 | learning rate: 1.864E-05 | global batch size: 32 | lm loss: 6.669123E+00 | loss scale: 32768.0 | grad norm: 114371.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3580/ 159576 | consumed samples: 67312 | elapsed time per iteration (ms): 14435.4 | learning rate: 1.865E-05 | global batch size: 32 | lm loss: 6.504656E+00 | loss scale: 32768.0 | grad norm: 143322.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3581/ 159576 | consumed samples: 67344 | elapsed time per iteration (ms): 14983.8 | learning rate: 1.866E-05 | global batch size: 32 | lm loss: 6.634960E+00 | loss scale: 32768.0 | grad norm: 124051.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3582/ 159576 | consumed samples: 67376 | elapsed time per iteration (ms): 14518.7 | learning rate: 1.867E-05 | global batch size: 32 | lm loss: 6.488723E+00 | loss scale: 32768.0 | grad norm: 108661.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3583/ 159576 | consumed samples: 67408 | elapsed time per iteration (ms): 14495.4 | learning rate: 1.868E-05 | global batch size: 32 | lm loss: 6.397575E+00 | loss scale: 32768.0 | grad norm: 156428.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3584/ 159576 | consumed samples: 67440 | elapsed time per iteration (ms): 14500.4 | learning rate: 1.869E-05 | global batch size: 32 | lm loss: 6.505555E+00 | loss scale: 32768.0 | grad norm: 158735.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3585/ 159576 | consumed samples: 67472 | elapsed time per iteration (ms): 14850.8 | learning rate: 1.870E-05 | global batch size: 32 | lm loss: 6.384704E+00 | loss scale: 32768.0 | grad norm: 121455.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3586/ 159576 | consumed samples: 67504 | elapsed time per iteration (ms): 14516.1 | learning rate: 1.871E-05 | global batch size: 32 | lm loss: 6.391223E+00 | loss scale: 32768.0 | grad norm: 200272.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3587/ 159576 | consumed samples: 67536 | elapsed time per iteration (ms): 14478.9 | learning rate: 1.871E-05 | global batch size: 32 | lm loss: 6.602296E+00 | loss scale: 32768.0 | grad norm: 156857.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3588/ 159576 | consumed samples: 67568 | elapsed time per iteration (ms): 14457.3 | learning rate: 1.872E-05 | global batch size: 32 | lm loss: 6.356599E+00 | loss scale: 32768.0 | grad norm: 132240.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3589/ 159576 | consumed samples: 67600 | elapsed time per iteration (ms): 14840.9 | learning rate: 1.873E-05 | global batch size: 32 | lm loss: 6.517581E+00 | loss scale: 32768.0 | grad norm: 101976.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3590/ 159576 | consumed samples: 67632 | elapsed time per iteration (ms): 14478.5 | learning rate: 1.874E-05 | global batch size: 32 | lm loss: 6.495076E+00 | loss scale: 32768.0 | grad norm: 145637.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3591/ 159576 | consumed samples: 67664 | elapsed time per iteration (ms): 14537.3 | learning rate: 1.875E-05 | global batch size: 32 | lm loss: 6.486649E+00 | loss scale: 32768.0 | grad norm: 110128.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3592/ 159576 | consumed samples: 67696 | elapsed time per iteration (ms): 14585.1 | learning rate: 1.876E-05 | global batch size: 32 | lm loss: 6.484485E+00 | loss scale: 32768.0 | grad norm: 93123.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3593/ 159576 | consumed samples: 67728 | elapsed time per iteration (ms): 14970.8 | learning rate: 1.877E-05 | global batch size: 32 | lm loss: 6.605970E+00 | loss scale: 32768.0 | grad norm: 196733.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3594/ 159576 | consumed samples: 67760 | elapsed time per iteration (ms): 14488.2 | learning rate: 1.878E-05 | global batch size: 32 | lm loss: 6.408032E+00 | loss scale: 32768.0 | grad norm: 119062.835 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3595/ 159576 | consumed samples: 67792 | elapsed time per iteration (ms): 14589.0 | learning rate: 1.879E-05 | global batch size: 32 | lm loss: 6.434669E+00 | loss scale: 32768.0 | grad norm: 163713.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3596/ 159576 | consumed samples: 67824 | elapsed time per iteration (ms): 14467.1 | learning rate: 1.879E-05 | global batch size: 32 | lm loss: 6.515763E+00 | loss scale: 32768.0 | grad norm: 123609.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3597/ 159576 | consumed samples: 67856 | elapsed time per iteration (ms): 14918.0 | learning rate: 1.880E-05 | global batch size: 32 | lm loss: 6.473671E+00 | loss scale: 32768.0 | grad norm: 113241.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3598/ 159576 | consumed samples: 67888 | elapsed time per iteration (ms): 14630.3 | learning rate: 1.881E-05 | global batch size: 32 | lm loss: 6.497471E+00 | loss scale: 32768.0 | grad norm: 180550.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3599/ 159576 | consumed samples: 67920 | elapsed time per iteration (ms): 14523.9 | learning rate: 1.882E-05 | global batch size: 32 | lm loss: 6.665214E+00 | loss scale: 32768.0 | grad norm: 120833.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3600/ 159576 | consumed samples: 67952 | elapsed time per iteration (ms): 14548.6 | learning rate: 1.883E-05 | global batch size: 32 | lm loss: 6.506467E+00 | loss scale: 32768.0 | grad norm: 124134.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3601/ 159576 | consumed samples: 67984 | elapsed time per iteration (ms): 14576.2 | learning rate: 1.884E-05 | global batch size: 32 | lm loss: 6.491764E+00 | loss scale: 32768.0 | grad norm: 230059.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3602/ 159576 | consumed samples: 68016 | elapsed time per iteration (ms): 14979.8 | learning rate: 1.885E-05 | global batch size: 32 | lm loss: 6.445697E+00 | loss scale: 32768.0 | grad norm: 125622.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3603/ 159576 | consumed samples: 68048 | elapsed time per iteration (ms): 14453.6 | learning rate: 1.886E-05 | global batch size: 32 | lm loss: 6.613330E+00 | loss scale: 32768.0 | grad norm: 166344.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3604/ 159576 | consumed samples: 68080 | elapsed time per iteration (ms): 14495.4 | learning rate: 1.887E-05 | global batch size: 32 | lm loss: 6.603212E+00 | loss scale: 32768.0 | grad norm: 93757.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3605/ 159576 | consumed samples: 68112 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.887E-05 | global batch size: 32 | lm loss: 6.342390E+00 | loss scale: 32768.0 | grad norm: 130006.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3606/ 159576 | consumed samples: 68144 | elapsed time per iteration (ms): 14685.4 | learning rate: 1.888E-05 | global batch size: 32 | lm loss: 6.480408E+00 | loss scale: 32768.0 | grad norm: 106365.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3607/ 159576 | consumed samples: 68176 | elapsed time per iteration (ms): 14517.9 | learning rate: 1.889E-05 | global batch size: 32 | lm loss: 6.591272E+00 | loss scale: 32768.0 | grad norm: 171235.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3608/ 159576 | consumed samples: 68208 | elapsed time per iteration (ms): 14591.0 | learning rate: 1.890E-05 | global batch size: 32 | lm loss: 6.311239E+00 | loss scale: 32768.0 | grad norm: 126858.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3609/ 159576 | consumed samples: 68240 | elapsed time per iteration (ms): 14549.9 | learning rate: 1.891E-05 | global batch size: 32 | lm loss: 6.395494E+00 | loss scale: 32768.0 | grad norm: 227345.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3610/ 159576 | consumed samples: 68272 | elapsed time per iteration (ms): 14677.9 | learning rate: 1.892E-05 | global batch size: 32 | lm loss: 6.557859E+00 | loss scale: 32768.0 | grad norm: 116386.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3611/ 159576 | consumed samples: 68304 | elapsed time per iteration (ms): 14497.7 | learning rate: 1.893E-05 | global batch size: 32 | lm loss: 6.436782E+00 | loss scale: 32768.0 | grad norm: 130216.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3612/ 159576 | consumed samples: 68336 | elapsed time per iteration (ms): 14516.9 | learning rate: 1.894E-05 | global batch size: 32 | lm loss: 6.523721E+00 | loss scale: 32768.0 | grad norm: 153807.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3613/ 159576 | consumed samples: 68368 | elapsed time per iteration (ms): 14537.1 | learning rate: 1.895E-05 | global batch size: 32 | lm loss: 6.480092E+00 | loss scale: 32768.0 | grad norm: 191977.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3614/ 159576 | consumed samples: 68400 | elapsed time per iteration (ms): 14777.4 | learning rate: 1.895E-05 | global batch size: 32 | lm loss: 6.507137E+00 | loss scale: 32768.0 | grad norm: 147123.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3615/ 159576 | consumed samples: 68432 | elapsed time per iteration (ms): 14631.8 | learning rate: 1.896E-05 | global batch size: 32 | lm loss: 6.413469E+00 | loss scale: 32768.0 | grad norm: 151298.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3616/ 159576 | consumed samples: 68464 | elapsed time per iteration (ms): 14498.7 | learning rate: 1.897E-05 | global batch size: 32 | lm loss: 6.400654E+00 | loss scale: 32768.0 | grad norm: 144773.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3617/ 159576 | consumed samples: 68496 | elapsed time per iteration (ms): 14516.2 | learning rate: 1.898E-05 | global batch size: 32 | lm loss: 6.514056E+00 | loss scale: 32768.0 | grad norm: 212184.973 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3618/ 159576 | consumed samples: 68528 | elapsed time per iteration (ms): 15120.1 | learning rate: 1.899E-05 | global batch size: 32 | lm loss: 6.476982E+00 | loss scale: 32768.0 | grad norm: 138389.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3619/ 159576 | consumed samples: 68560 | elapsed time per iteration (ms): 14520.5 | learning rate: 1.900E-05 | global batch size: 32 | lm loss: 6.413394E+00 | loss scale: 32768.0 | grad norm: 144757.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3620/ 159576 | consumed samples: 68592 | elapsed time per iteration (ms): 14501.8 | learning rate: 1.901E-05 | global batch size: 32 | lm loss: 6.508588E+00 | loss scale: 32768.0 | grad norm: 119480.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3621/ 159576 | consumed samples: 68624 | elapsed time per iteration (ms): 14544.3 | learning rate: 1.902E-05 | global batch size: 32 | lm loss: 6.462088E+00 | loss scale: 32768.0 | grad norm: 118576.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3622/ 159576 | consumed samples: 68656 | elapsed time per iteration (ms): 14904.8 | learning rate: 1.903E-05 | global batch size: 32 | lm loss: 6.518481E+00 | loss scale: 32768.0 | grad norm: 166384.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3623/ 159576 | consumed samples: 68688 | elapsed time per iteration (ms): 14536.7 | learning rate: 1.903E-05 | global batch size: 32 | lm loss: 6.418991E+00 | loss scale: 32768.0 | grad norm: 133937.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3624/ 159576 | consumed samples: 68720 | elapsed time per iteration (ms): 14549.8 | learning rate: 1.904E-05 | global batch size: 32 | lm loss: 6.446878E+00 | loss scale: 32768.0 | grad norm: 270206.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3625/ 159576 | consumed samples: 68752 | elapsed time per iteration (ms): 14599.2 | learning rate: 1.905E-05 | global batch size: 32 | lm loss: 6.534576E+00 | loss scale: 32768.0 | grad norm: 155344.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3626/ 159576 | consumed samples: 68784 | elapsed time per iteration (ms): 14722.9 | learning rate: 1.906E-05 | global batch size: 32 | lm loss: 6.630429E+00 | loss scale: 32768.0 | grad norm: 199114.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3627/ 159576 | consumed samples: 68816 | elapsed time per iteration (ms): 14500.1 | learning rate: 1.907E-05 | global batch size: 32 | lm loss: 6.356173E+00 | loss scale: 32768.0 | grad norm: 167282.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3628/ 159576 | consumed samples: 68848 | elapsed time per iteration (ms): 14530.4 | learning rate: 1.908E-05 | global batch size: 32 | lm loss: 6.471046E+00 | loss scale: 32768.0 | grad norm: 208481.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3629/ 159576 | consumed samples: 68880 | elapsed time per iteration (ms): 14549.1 | learning rate: 1.909E-05 | global batch size: 32 | lm loss: 6.412348E+00 | loss scale: 32768.0 | grad norm: 149105.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3630/ 159576 | consumed samples: 68912 | elapsed time per iteration (ms): 14882.4 | learning rate: 1.910E-05 | global batch size: 32 | lm loss: 6.520298E+00 | loss scale: 32768.0 | grad norm: 123369.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3631/ 159576 | consumed samples: 68944 | elapsed time per iteration (ms): 14575.6 | learning rate: 1.911E-05 | global batch size: 32 | lm loss: 6.558264E+00 | loss scale: 32768.0 | grad norm: 243133.943 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3632/ 159576 | consumed samples: 68976 | elapsed time per iteration (ms): 14516.5 | learning rate: 1.911E-05 | global batch size: 32 | lm loss: 6.583918E+00 | loss scale: 32768.0 | grad norm: 178142.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3633/ 159576 | consumed samples: 69008 | elapsed time per iteration (ms): 14471.4 | learning rate: 1.912E-05 | global batch size: 32 | lm loss: 6.540310E+00 | loss scale: 32768.0 | grad norm: 189782.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3634/ 159576 | consumed samples: 69040 | elapsed time per iteration (ms): 14945.9 | learning rate: 1.913E-05 | global batch size: 32 | lm loss: 6.505736E+00 | loss scale: 32768.0 | grad norm: 165872.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3635/ 159576 | consumed samples: 69072 | elapsed time per iteration (ms): 14539.5 | learning rate: 1.914E-05 | global batch size: 32 | lm loss: 6.509236E+00 | loss scale: 32768.0 | grad norm: 245470.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3636/ 159576 | consumed samples: 69104 | elapsed time per iteration (ms): 14545.2 | learning rate: 1.915E-05 | global batch size: 32 | lm loss: 6.504992E+00 | loss scale: 32768.0 | grad norm: 150104.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3637/ 159576 | consumed samples: 69136 | elapsed time per iteration (ms): 14567.6 | learning rate: 1.916E-05 | global batch size: 32 | lm loss: 6.406890E+00 | loss scale: 32768.0 | grad norm: 135913.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3638/ 159576 | consumed samples: 69168 | elapsed time per iteration (ms): 14896.3 | learning rate: 1.917E-05 | global batch size: 32 | lm loss: 6.443694E+00 | loss scale: 32768.0 | grad norm: 185702.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3639/ 159576 | consumed samples: 69200 | elapsed time per iteration (ms): 14591.0 | learning rate: 1.918E-05 | global batch size: 32 | lm loss: 6.556330E+00 | loss scale: 32768.0 | grad norm: 244123.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3640/ 159576 | consumed samples: 69232 | elapsed time per iteration (ms): 14549.7 | learning rate: 1.918E-05 | global batch size: 32 | lm loss: 6.487778E+00 | loss scale: 32768.0 | grad norm: 177114.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3641/ 159576 | consumed samples: 69264 | elapsed time per iteration (ms): 14570.7 | learning rate: 1.919E-05 | global batch size: 32 | lm loss: 6.513255E+00 | loss scale: 32768.0 | grad norm: 131694.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3642/ 159576 | consumed samples: 69296 | elapsed time per iteration (ms): 14516.4 | learning rate: 1.920E-05 | global batch size: 32 | lm loss: 6.592026E+00 | loss scale: 32768.0 | grad norm: 290876.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3643/ 159576 | consumed samples: 69328 | elapsed time per iteration (ms): 14756.7 | learning rate: 1.921E-05 | global batch size: 32 | lm loss: 6.662066E+00 | loss scale: 32768.0 | grad norm: 228974.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3644/ 159576 | consumed samples: 69360 | elapsed time per iteration (ms): 14551.2 | learning rate: 1.922E-05 | global batch size: 32 | lm loss: 6.366663E+00 | loss scale: 32768.0 | grad norm: 161091.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3645/ 159576 | consumed samples: 69392 | elapsed time per iteration (ms): 14619.9 | learning rate: 1.923E-05 | global batch size: 32 | lm loss: 6.523453E+00 | loss scale: 32768.0 | grad norm: 136622.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3646/ 159576 | consumed samples: 69424 | elapsed time per iteration (ms): 14549.7 | learning rate: 1.924E-05 | global batch size: 32 | lm loss: 6.502388E+00 | loss scale: 32768.0 | grad norm: 233041.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3647/ 159576 | consumed samples: 69456 | elapsed time per iteration (ms): 14639.6 | learning rate: 1.925E-05 | global batch size: 32 | lm loss: 6.570889E+00 | loss scale: 32768.0 | grad norm: 177700.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3648/ 159576 | consumed samples: 69488 | elapsed time per iteration (ms): 14511.4 | learning rate: 1.926E-05 | global batch size: 32 | lm loss: 6.538668E+00 | loss scale: 32768.0 | grad norm: 167613.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3649/ 159576 | consumed samples: 69520 | elapsed time per iteration (ms): 14499.6 | learning rate: 1.926E-05 | global batch size: 32 | lm loss: 6.650812E+00 | loss scale: 32768.0 | grad norm: 144019.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3650/ 159576 | consumed samples: 69552 | elapsed time per iteration (ms): 14509.6 | learning rate: 1.927E-05 | global batch size: 32 | lm loss: 6.449777E+00 | loss scale: 32768.0 | grad norm: 190635.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3651/ 159576 | consumed samples: 69584 | elapsed time per iteration (ms): 14775.5 | learning rate: 1.928E-05 | global batch size: 32 | lm loss: 6.435673E+00 | loss scale: 32768.0 | grad norm: 181537.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3652/ 159576 | consumed samples: 69616 | elapsed time per iteration (ms): 14563.5 | learning rate: 1.929E-05 | global batch size: 32 | lm loss: 6.631623E+00 | loss scale: 32768.0 | grad norm: 150202.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3653/ 159576 | consumed samples: 69648 | elapsed time per iteration (ms): 14524.8 | learning rate: 1.930E-05 | global batch size: 32 | lm loss: 6.612866E+00 | loss scale: 32768.0 | grad norm: 136863.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3654/ 159576 | consumed samples: 69680 | elapsed time per iteration (ms): 14611.3 | learning rate: 1.931E-05 | global batch size: 32 | lm loss: 6.471664E+00 | loss scale: 32768.0 | grad norm: 177103.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3655/ 159576 | consumed samples: 69712 | elapsed time per iteration (ms): 14752.9 | learning rate: 1.932E-05 | global batch size: 32 | lm loss: 6.436707E+00 | loss scale: 32768.0 | grad norm: 107210.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3656/ 159576 | consumed samples: 69744 | elapsed time per iteration (ms): 14544.1 | learning rate: 1.933E-05 | global batch size: 32 | lm loss: 6.679466E+00 | loss scale: 32768.0 | grad norm: 156389.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3657/ 159576 | consumed samples: 69776 | elapsed time per iteration (ms): 14560.9 | learning rate: 1.934E-05 | global batch size: 32 | lm loss: 6.478530E+00 | loss scale: 32768.0 | grad norm: 136151.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3658/ 159576 | consumed samples: 69808 | elapsed time per iteration (ms): 14516.8 | learning rate: 1.934E-05 | global batch size: 32 | lm loss: 6.537941E+00 | loss scale: 32768.0 | grad norm: 169825.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3659/ 159576 | consumed samples: 69840 | elapsed time per iteration (ms): 15041.8 | learning rate: 1.935E-05 | global batch size: 32 | lm loss: 6.414840E+00 | loss scale: 32768.0 | grad norm: 116305.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3660/ 159576 | consumed samples: 69872 | elapsed time per iteration (ms): 14596.0 | learning rate: 1.936E-05 | global batch size: 32 | lm loss: 6.423607E+00 | loss scale: 32768.0 | grad norm: 157726.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3661/ 159576 | consumed samples: 69904 | elapsed time per iteration (ms): 14600.4 | learning rate: 1.937E-05 | global batch size: 32 | lm loss: 6.516055E+00 | loss scale: 32768.0 | grad norm: 150170.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3662/ 159576 | consumed samples: 69936 | elapsed time per iteration (ms): 14508.1 | learning rate: 1.938E-05 | global batch size: 32 | lm loss: 6.406610E+00 | loss scale: 32768.0 | grad norm: 180125.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3663/ 159576 | consumed samples: 69968 | elapsed time per iteration (ms): 14795.2 | learning rate: 1.939E-05 | global batch size: 32 | lm loss: 6.495340E+00 | loss scale: 32768.0 | grad norm: 156226.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3664/ 159576 | consumed samples: 70000 | elapsed time per iteration (ms): 14502.7 | learning rate: 1.940E-05 | global batch size: 32 | lm loss: 6.478324E+00 | loss scale: 32768.0 | grad norm: 139199.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3665/ 159576 | consumed samples: 70032 | elapsed time per iteration (ms): 14521.4 | learning rate: 1.941E-05 | global batch size: 32 | lm loss: 6.486080E+00 | loss scale: 32768.0 | grad norm: 139987.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3666/ 159576 | consumed samples: 70064 | elapsed time per iteration (ms): 14501.0 | learning rate: 1.942E-05 | global batch size: 32 | lm loss: 6.412463E+00 | loss scale: 32768.0 | grad norm: 187000.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3667/ 159576 | consumed samples: 70096 | elapsed time per iteration (ms): 14907.7 | learning rate: 1.942E-05 | global batch size: 32 | lm loss: 6.555160E+00 | loss scale: 32768.0 | grad norm: 151236.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3668/ 159576 | consumed samples: 70128 | elapsed time per iteration (ms): 14546.0 | learning rate: 1.943E-05 | global batch size: 32 | lm loss: 6.466833E+00 | loss scale: 32768.0 | grad norm: 188341.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3669/ 159576 | consumed samples: 70160 | elapsed time per iteration (ms): 14504.0 | learning rate: 1.944E-05 | global batch size: 32 | lm loss: 6.512917E+00 | loss scale: 32768.0 | grad norm: 142898.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3670/ 159576 | consumed samples: 70192 | elapsed time per iteration (ms): 14550.7 | learning rate: 1.945E-05 | global batch size: 32 | lm loss: 6.662933E+00 | loss scale: 32768.0 | grad norm: 155470.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3671/ 159576 | consumed samples: 70224 | elapsed time per iteration (ms): 14892.4 | learning rate: 1.946E-05 | global batch size: 32 | lm loss: 6.373161E+00 | loss scale: 32768.0 | grad norm: 150042.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3672/ 159576 | consumed samples: 70256 | elapsed time per iteration (ms): 14566.7 | learning rate: 1.947E-05 | global batch size: 32 | lm loss: 6.426474E+00 | loss scale: 32768.0 | grad norm: 170805.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3673/ 159576 | consumed samples: 70288 | elapsed time per iteration (ms): 14501.7 | learning rate: 1.948E-05 | global batch size: 32 | lm loss: 6.370544E+00 | loss scale: 32768.0 | grad norm: 138493.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3674/ 159576 | consumed samples: 70320 | elapsed time per iteration (ms): 14600.9 | learning rate: 1.949E-05 | global batch size: 32 | lm loss: 6.383911E+00 | loss scale: 32768.0 | grad norm: 137200.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3675/ 159576 | consumed samples: 70352 | elapsed time per iteration (ms): 14904.3 | learning rate: 1.950E-05 | global batch size: 32 | lm loss: 6.430146E+00 | loss scale: 32768.0 | grad norm: 130856.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3676/ 159576 | consumed samples: 70384 | elapsed time per iteration (ms): 14544.1 | learning rate: 1.950E-05 | global batch size: 32 | lm loss: 6.359234E+00 | loss scale: 32768.0 | grad norm: 123290.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3677/ 159576 | consumed samples: 70416 | elapsed time per iteration (ms): 14660.6 | learning rate: 1.951E-05 | global batch size: 32 | lm loss: 6.340640E+00 | loss scale: 32768.0 | grad norm: 128445.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3678/ 159576 | consumed samples: 70448 | elapsed time per iteration (ms): 14469.4 | learning rate: 1.952E-05 | global batch size: 32 | lm loss: 6.467716E+00 | loss scale: 32768.0 | grad norm: 222732.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3679/ 159576 | consumed samples: 70480 | elapsed time per iteration (ms): 14540.6 | learning rate: 1.953E-05 | global batch size: 32 | lm loss: 6.401999E+00 | loss scale: 32768.0 | grad norm: 143732.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3680/ 159576 | consumed samples: 70512 | elapsed time per iteration (ms): 14837.8 | learning rate: 1.954E-05 | global batch size: 32 | lm loss: 6.469200E+00 | loss scale: 32768.0 | grad norm: 148617.864 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3681/ 159576 | consumed samples: 70544 | elapsed time per iteration (ms): 14560.6 | learning rate: 1.955E-05 | global batch size: 32 | lm loss: 6.503996E+00 | loss scale: 32768.0 | grad norm: 151584.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3682/ 159576 | consumed samples: 70576 | elapsed time per iteration (ms): 14533.4 | learning rate: 1.956E-05 | global batch size: 32 | lm loss: 6.473675E+00 | loss scale: 32768.0 | grad norm: 171148.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3683/ 159576 | consumed samples: 70608 | elapsed time per iteration (ms): 14606.7 | learning rate: 1.957E-05 | global batch size: 32 | lm loss: 6.406356E+00 | loss scale: 32768.0 | grad norm: 139281.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3684/ 159576 | consumed samples: 70640 | elapsed time per iteration (ms): 14772.8 | learning rate: 1.958E-05 | global batch size: 32 | lm loss: 6.329139E+00 | loss scale: 32768.0 | grad norm: 108055.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3685/ 159576 | consumed samples: 70672 | elapsed time per iteration (ms): 14518.6 | learning rate: 1.958E-05 | global batch size: 32 | lm loss: 6.525671E+00 | loss scale: 32768.0 | grad norm: 204684.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3686/ 159576 | consumed samples: 70704 | elapsed time per iteration (ms): 14569.3 | learning rate: 1.959E-05 | global batch size: 32 | lm loss: 6.454522E+00 | loss scale: 32768.0 | grad norm: 108450.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3687/ 159576 | consumed samples: 70736 | elapsed time per iteration (ms): 14527.9 | learning rate: 1.960E-05 | global batch size: 32 | lm loss: 6.452621E+00 | loss scale: 32768.0 | grad norm: 154981.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3688/ 159576 | consumed samples: 70768 | elapsed time per iteration (ms): 14681.9 | learning rate: 1.961E-05 | global batch size: 32 | lm loss: 6.485929E+00 | loss scale: 32768.0 | grad norm: 132389.054 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3689/ 159576 | consumed samples: 70800 | elapsed time per iteration (ms): 14628.9 | learning rate: 1.962E-05 | global batch size: 32 | lm loss: 6.560607E+00 | loss scale: 32768.0 | grad norm: 244618.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3690/ 159576 | consumed samples: 70832 | elapsed time per iteration (ms): 14570.6 | learning rate: 1.963E-05 | global batch size: 32 | lm loss: 6.545405E+00 | loss scale: 32768.0 | grad norm: 207471.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3691/ 159576 | consumed samples: 70864 | elapsed time per iteration (ms): 14568.4 | learning rate: 1.964E-05 | global batch size: 32 | lm loss: 6.403141E+00 | loss scale: 32768.0 | grad norm: 160751.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3692/ 159576 | consumed samples: 70896 | elapsed time per iteration (ms): 14828.9 | learning rate: 1.965E-05 | global batch size: 32 | lm loss: 6.494320E+00 | loss scale: 32768.0 | grad norm: 142715.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3693/ 159576 | consumed samples: 70928 | elapsed time per iteration (ms): 14576.4 | learning rate: 1.966E-05 | global batch size: 32 | lm loss: 6.317194E+00 | loss scale: 32768.0 | grad norm: 218725.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3694/ 159576 | consumed samples: 70960 | elapsed time per iteration (ms): 14558.1 | learning rate: 1.966E-05 | global batch size: 32 | lm loss: 6.404289E+00 | loss scale: 32768.0 | grad norm: 133735.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3695/ 159576 | consumed samples: 70992 | elapsed time per iteration (ms): 14502.5 | learning rate: 1.967E-05 | global batch size: 32 | lm loss: 6.501413E+00 | loss scale: 32768.0 | grad norm: 126881.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3696/ 159576 | consumed samples: 71024 | elapsed time per iteration (ms): 14876.1 | learning rate: 1.968E-05 | global batch size: 32 | lm loss: 6.348512E+00 | loss scale: 32768.0 | grad norm: 117844.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3697/ 159576 | consumed samples: 71056 | elapsed time per iteration (ms): 14704.7 | learning rate: 1.969E-05 | global batch size: 32 | lm loss: 6.490881E+00 | loss scale: 32768.0 | grad norm: 191050.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3698/ 159576 | consumed samples: 71088 | elapsed time per iteration (ms): 14521.5 | learning rate: 1.970E-05 | global batch size: 32 | lm loss: 6.399506E+00 | loss scale: 32768.0 | grad norm: 131579.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3699/ 159576 | consumed samples: 71120 | elapsed time per iteration (ms): 14570.1 | learning rate: 1.971E-05 | global batch size: 32 | lm loss: 6.507861E+00 | loss scale: 32768.0 | grad norm: 124970.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3700/ 159576 | consumed samples: 71152 | elapsed time per iteration (ms): 15037.4 | learning rate: 1.972E-05 | global batch size: 32 | lm loss: 6.460707E+00 | loss scale: 32768.0 | grad norm: 163864.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3701/ 159576 | consumed samples: 71184 | elapsed time per iteration (ms): 14616.1 | learning rate: 1.973E-05 | global batch size: 32 | lm loss: 6.410345E+00 | loss scale: 32768.0 | grad norm: 155995.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3702/ 159576 | consumed samples: 71216 | elapsed time per iteration (ms): 14555.1 | learning rate: 1.974E-05 | global batch size: 32 | lm loss: 6.418409E+00 | loss scale: 32768.0 | grad norm: 135398.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3703/ 159576 | consumed samples: 71248 | elapsed time per iteration (ms): 14529.9 | learning rate: 1.974E-05 | global batch size: 32 | lm loss: 6.445669E+00 | loss scale: 32768.0 | grad norm: 149575.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3704/ 159576 | consumed samples: 71280 | elapsed time per iteration (ms): 14938.6 | learning rate: 1.975E-05 | global batch size: 32 | lm loss: 6.466682E+00 | loss scale: 32768.0 | grad norm: 158480.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3705/ 159576 | consumed samples: 71312 | elapsed time per iteration (ms): 14501.2 | learning rate: 1.976E-05 | global batch size: 32 | lm loss: 6.391745E+00 | loss scale: 32768.0 | grad norm: 130405.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3706/ 159576 | consumed samples: 71344 | elapsed time per iteration (ms): 14560.8 | learning rate: 1.977E-05 | global batch size: 32 | lm loss: 6.367959E+00 | loss scale: 32768.0 | grad norm: 134894.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3707/ 159576 | consumed samples: 71376 | elapsed time per iteration (ms): 14606.1 | learning rate: 1.978E-05 | global batch size: 32 | lm loss: 6.568520E+00 | loss scale: 32768.0 | grad norm: 127252.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3708/ 159576 | consumed samples: 71408 | elapsed time per iteration (ms): 14831.0 | learning rate: 1.979E-05 | global batch size: 32 | lm loss: 6.451063E+00 | loss scale: 32768.0 | grad norm: 352497.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3709/ 159576 | consumed samples: 71440 | elapsed time per iteration (ms): 14547.0 | learning rate: 1.980E-05 | global batch size: 32 | lm loss: 6.534979E+00 | loss scale: 32768.0 | grad norm: 139565.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3710/ 159576 | consumed samples: 71472 | elapsed time per iteration (ms): 14583.9 | learning rate: 1.981E-05 | global batch size: 32 | lm loss: 6.561714E+00 | loss scale: 32768.0 | grad norm: 190647.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3711/ 159576 | consumed samples: 71504 | elapsed time per iteration (ms): 14605.2 | learning rate: 1.982E-05 | global batch size: 32 | lm loss: 6.594619E+00 | loss scale: 32768.0 | grad norm: 159179.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3712/ 159576 | consumed samples: 71536 | elapsed time per iteration (ms): 14853.8 | learning rate: 1.982E-05 | global batch size: 32 | lm loss: 6.221584E+00 | loss scale: 32768.0 | grad norm: 163662.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3713/ 159576 | consumed samples: 71568 | elapsed time per iteration (ms): 14625.6 | learning rate: 1.983E-05 | global batch size: 32 | lm loss: 6.384083E+00 | loss scale: 32768.0 | grad norm: 157426.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3714/ 159576 | consumed samples: 71600 | elapsed time per iteration (ms): 14617.1 | learning rate: 1.984E-05 | global batch size: 32 | lm loss: 6.457389E+00 | loss scale: 32768.0 | grad norm: 163827.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3715/ 159576 | consumed samples: 71632 | elapsed time per iteration (ms): 14519.7 | learning rate: 1.985E-05 | global batch size: 32 | lm loss: 6.461262E+00 | loss scale: 32768.0 | grad norm: 150641.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3716/ 159576 | consumed samples: 71664 | elapsed time per iteration (ms): 14921.5 | learning rate: 1.986E-05 | global batch size: 32 | lm loss: 6.345608E+00 | loss scale: 32768.0 | grad norm: 146728.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3717/ 159576 | consumed samples: 71696 | elapsed time per iteration (ms): 14643.5 | learning rate: 1.987E-05 | global batch size: 32 | lm loss: 6.488680E+00 | loss scale: 32768.0 | grad norm: 159547.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3718/ 159576 | consumed samples: 71728 | elapsed time per iteration (ms): 14531.6 | learning rate: 1.988E-05 | global batch size: 32 | lm loss: 6.358843E+00 | loss scale: 32768.0 | grad norm: 120331.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3719/ 159576 | consumed samples: 71760 | elapsed time per iteration (ms): 14544.0 | learning rate: 1.989E-05 | global batch size: 32 | lm loss: 6.480108E+00 | loss scale: 32768.0 | grad norm: 136903.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3720/ 159576 | consumed samples: 71792 | elapsed time per iteration (ms): 14789.8 | learning rate: 1.989E-05 | global batch size: 32 | lm loss: 6.423407E+00 | loss scale: 32768.0 | grad norm: 144666.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3721/ 159576 | consumed samples: 71824 | elapsed time per iteration (ms): 14759.3 | learning rate: 1.990E-05 | global batch size: 32 | lm loss: 6.280478E+00 | loss scale: 32768.0 | grad norm: 131505.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3722/ 159576 | consumed samples: 71856 | elapsed time per iteration (ms): 14493.1 | learning rate: 1.991E-05 | global batch size: 32 | lm loss: 6.341520E+00 | loss scale: 32768.0 | grad norm: 153861.927 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3723/ 159576 | consumed samples: 71888 | elapsed time per iteration (ms): 14523.6 | learning rate: 1.992E-05 | global batch size: 32 | lm loss: 6.470270E+00 | loss scale: 32768.0 | grad norm: 129755.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3724/ 159576 | consumed samples: 71920 | elapsed time per iteration (ms): 14486.1 | learning rate: 1.993E-05 | global batch size: 32 | lm loss: 6.425168E+00 | loss scale: 32768.0 | grad norm: 117324.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3725/ 159576 | consumed samples: 71952 | elapsed time per iteration (ms): 14760.5 | learning rate: 1.994E-05 | global batch size: 32 | lm loss: 6.508280E+00 | loss scale: 32768.0 | grad norm: 128492.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3726/ 159576 | consumed samples: 71984 | elapsed time per iteration (ms): 14523.7 | learning rate: 1.995E-05 | global batch size: 32 | lm loss: 6.451111E+00 | loss scale: 32768.0 | grad norm: 167230.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3727/ 159576 | consumed samples: 72016 | elapsed time per iteration (ms): 14569.3 | learning rate: 1.996E-05 | global batch size: 32 | lm loss: 6.428119E+00 | loss scale: 32768.0 | grad norm: 118648.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3728/ 159576 | consumed samples: 72048 | elapsed time per iteration (ms): 14495.2 | learning rate: 1.997E-05 | global batch size: 32 | lm loss: 6.472005E+00 | loss scale: 32768.0 | grad norm: 129074.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3729/ 159576 | consumed samples: 72080 | elapsed time per iteration (ms): 14750.9 | learning rate: 1.997E-05 | global batch size: 32 | lm loss: 6.501527E+00 | loss scale: 32768.0 | grad norm: 149114.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3730/ 159576 | consumed samples: 72112 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.998E-05 | global batch size: 32 | lm loss: 6.441484E+00 | loss scale: 32768.0 | grad norm: 115103.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3731/ 159576 | consumed samples: 72144 | elapsed time per iteration (ms): 14563.9 | learning rate: 1.999E-05 | global batch size: 32 | lm loss: 6.365570E+00 | loss scale: 32768.0 | grad norm: 122866.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3732/ 159576 | consumed samples: 72176 | elapsed time per iteration (ms): 14514.0 | learning rate: 2.000E-05 | global batch size: 32 | lm loss: 6.432354E+00 | loss scale: 32768.0 | grad norm: 117503.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3733/ 159576 | consumed samples: 72208 | elapsed time per iteration (ms): 14782.6 | learning rate: 2.001E-05 | global batch size: 32 | lm loss: 6.406446E+00 | loss scale: 32768.0 | grad norm: 118771.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3734/ 159576 | consumed samples: 72240 | elapsed time per iteration (ms): 14599.5 | learning rate: 2.002E-05 | global batch size: 32 | lm loss: 6.564467E+00 | loss scale: 32768.0 | grad norm: 113605.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3735/ 159576 | consumed samples: 72272 | elapsed time per iteration (ms): 14490.9 | learning rate: 2.003E-05 | global batch size: 32 | lm loss: 6.709463E+00 | loss scale: 32768.0 | grad norm: 143048.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3736/ 159576 | consumed samples: 72304 | elapsed time per iteration (ms): 14616.2 | learning rate: 2.004E-05 | global batch size: 32 | lm loss: 6.388952E+00 | loss scale: 32768.0 | grad norm: 148752.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3737/ 159576 | consumed samples: 72336 | elapsed time per iteration (ms): 14690.4 | learning rate: 2.005E-05 | global batch size: 32 | lm loss: 6.671305E+00 | loss scale: 32768.0 | grad norm: 167080.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3738/ 159576 | consumed samples: 72368 | elapsed time per iteration (ms): 14577.2 | learning rate: 2.005E-05 | global batch size: 32 | lm loss: 6.441625E+00 | loss scale: 32768.0 | grad norm: 132744.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3739/ 159576 | consumed samples: 72400 | elapsed time per iteration (ms): 14526.3 | learning rate: 2.006E-05 | global batch size: 32 | lm loss: 6.382997E+00 | loss scale: 32768.0 | grad norm: 137597.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3740/ 159576 | consumed samples: 72432 | elapsed time per iteration (ms): 14497.0 | learning rate: 2.007E-05 | global batch size: 32 | lm loss: 6.423009E+00 | loss scale: 32768.0 | grad norm: 158026.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3741/ 159576 | consumed samples: 72464 | elapsed time per iteration (ms): 14972.2 | learning rate: 2.008E-05 | global batch size: 32 | lm loss: 6.350714E+00 | loss scale: 32768.0 | grad norm: 133556.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3742/ 159576 | consumed samples: 72496 | elapsed time per iteration (ms): 14524.0 | learning rate: 2.009E-05 | global batch size: 32 | lm loss: 6.481720E+00 | loss scale: 32768.0 | grad norm: 111295.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3743/ 159576 | consumed samples: 72528 | elapsed time per iteration (ms): 14585.5 | learning rate: 2.010E-05 | global batch size: 32 | lm loss: 6.427812E+00 | loss scale: 32768.0 | grad norm: 147125.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3744/ 159576 | consumed samples: 72560 | elapsed time per iteration (ms): 14494.4 | learning rate: 2.011E-05 | global batch size: 32 | lm loss: 6.548944E+00 | loss scale: 32768.0 | grad norm: 157070.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3745/ 159576 | consumed samples: 72592 | elapsed time per iteration (ms): 14860.3 | learning rate: 2.012E-05 | global batch size: 32 | lm loss: 6.524699E+00 | loss scale: 32768.0 | grad norm: 133650.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3746/ 159576 | consumed samples: 72624 | elapsed time per iteration (ms): 14524.8 | learning rate: 2.013E-05 | global batch size: 32 | lm loss: 6.462801E+00 | loss scale: 32768.0 | grad norm: 145785.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3747/ 159576 | consumed samples: 72656 | elapsed time per iteration (ms): 14508.2 | learning rate: 2.013E-05 | global batch size: 32 | lm loss: 6.505124E+00 | loss scale: 32768.0 | grad norm: 159039.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3748/ 159576 | consumed samples: 72688 | elapsed time per iteration (ms): 14534.8 | learning rate: 2.014E-05 | global batch size: 32 | lm loss: 6.554813E+00 | loss scale: 32768.0 | grad norm: 144107.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3749/ 159576 | consumed samples: 72720 | elapsed time per iteration (ms): 14885.2 | learning rate: 2.015E-05 | global batch size: 32 | lm loss: 6.509037E+00 | loss scale: 32768.0 | grad norm: 139312.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3750/ 159576 | consumed samples: 72752 | elapsed time per iteration (ms): 14531.0 | learning rate: 2.016E-05 | global batch size: 32 | lm loss: 6.393044E+00 | loss scale: 32768.0 | grad norm: 177829.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3751/ 159576 | consumed samples: 72784 | elapsed time per iteration (ms): 14500.7 | learning rate: 2.017E-05 | global batch size: 32 | lm loss: 6.362189E+00 | loss scale: 32768.0 | grad norm: 176679.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3752/ 159576 | consumed samples: 72816 | elapsed time per iteration (ms): 14533.8 | learning rate: 2.018E-05 | global batch size: 32 | lm loss: 6.594802E+00 | loss scale: 32768.0 | grad norm: 172136.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3753/ 159576 | consumed samples: 72848 | elapsed time per iteration (ms): 7743.9 | learning rate: 2.018E-05 | global batch size: 32 | lm loss: 6.535247E+00 | loss scale: 32768.0 | grad norm: 172136.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3754/ 159576 | consumed samples: 72880 | elapsed time per iteration (ms): 14383.1 | learning rate: 2.019E-05 | global batch size: 32 | lm loss: 6.354399E+00 | loss scale: 32768.0 | grad norm: 126648.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3755/ 159576 | consumed samples: 72912 | elapsed time per iteration (ms): 14590.3 | learning rate: 2.020E-05 | global batch size: 32 | lm loss: 6.473662E+00 | loss scale: 32768.0 | grad norm: 156295.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3756/ 159576 | consumed samples: 72944 | elapsed time per iteration (ms): 7767.7 | learning rate: 2.020E-05 | global batch size: 32 | lm loss: 6.609807E+00 | loss scale: 16384.0 | grad norm: 156295.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3757/ 159576 | consumed samples: 72976 | elapsed time per iteration (ms): 14046.4 | learning rate: 2.021E-05 | global batch size: 32 | lm loss: 6.389218E+00 | loss scale: 16384.0 | grad norm: 71738.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3758/ 159576 | consumed samples: 73008 | elapsed time per iteration (ms): 14805.7 | learning rate: 2.021E-05 | global batch size: 32 | lm loss: 6.361919E+00 | loss scale: 16384.0 | grad norm: 60700.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3759/ 159576 | consumed samples: 73040 | elapsed time per iteration (ms): 14722.8 | learning rate: 2.022E-05 | global batch size: 32 | lm loss: 6.447733E+00 | loss scale: 16384.0 | grad norm: 87663.180 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3760/ 159576 | consumed samples: 73072 | elapsed time per iteration (ms): 14583.0 | learning rate: 2.023E-05 | global batch size: 32 | lm loss: 6.446470E+00 | loss scale: 16384.0 | grad norm: 67781.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3761/ 159576 | consumed samples: 73104 | elapsed time per iteration (ms): 14493.9 | learning rate: 2.024E-05 | global batch size: 32 | lm loss: 6.378415E+00 | loss scale: 16384.0 | grad norm: 72177.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3762/ 159576 | consumed samples: 73136 | elapsed time per iteration (ms): 14567.8 | learning rate: 2.025E-05 | global batch size: 32 | lm loss: 6.576702E+00 | loss scale: 16384.0 | grad norm: 87501.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3763/ 159576 | consumed samples: 73168 | elapsed time per iteration (ms): 14732.6 | learning rate: 2.026E-05 | global batch size: 32 | lm loss: 6.522850E+00 | loss scale: 16384.0 | grad norm: 66784.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3764/ 159576 | consumed samples: 73200 | elapsed time per iteration (ms): 14572.5 | learning rate: 2.027E-05 | global batch size: 32 | lm loss: 6.361198E+00 | loss scale: 16384.0 | grad norm: 85761.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3765/ 159576 | consumed samples: 73232 | elapsed time per iteration (ms): 14647.5 | learning rate: 2.028E-05 | global batch size: 32 | lm loss: 6.605127E+00 | loss scale: 16384.0 | grad norm: 69863.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3766/ 159576 | consumed samples: 73264 | elapsed time per iteration (ms): 14606.0 | learning rate: 2.029E-05 | global batch size: 32 | lm loss: 6.398610E+00 | loss scale: 16384.0 | grad norm: 94809.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3767/ 159576 | consumed samples: 73296 | elapsed time per iteration (ms): 14708.7 | learning rate: 2.029E-05 | global batch size: 32 | lm loss: 6.484084E+00 | loss scale: 16384.0 | grad norm: 74741.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3768/ 159576 | consumed samples: 73328 | elapsed time per iteration (ms): 14555.4 | learning rate: 2.030E-05 | global batch size: 32 | lm loss: 6.496735E+00 | loss scale: 16384.0 | grad norm: 77000.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3769/ 159576 | consumed samples: 73360 | elapsed time per iteration (ms): 14556.9 | learning rate: 2.031E-05 | global batch size: 32 | lm loss: 6.386226E+00 | loss scale: 16384.0 | grad norm: 92155.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3770/ 159576 | consumed samples: 73392 | elapsed time per iteration (ms): 14623.6 | learning rate: 2.032E-05 | global batch size: 32 | lm loss: 6.446381E+00 | loss scale: 16384.0 | grad norm: 91554.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3771/ 159576 | consumed samples: 73424 | elapsed time per iteration (ms): 14736.8 | learning rate: 2.033E-05 | global batch size: 32 | lm loss: 6.477424E+00 | loss scale: 16384.0 | grad norm: 79287.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3772/ 159576 | consumed samples: 73456 | elapsed time per iteration (ms): 14586.8 | learning rate: 2.034E-05 | global batch size: 32 | lm loss: 6.505037E+00 | loss scale: 16384.0 | grad norm: 76395.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3773/ 159576 | consumed samples: 73488 | elapsed time per iteration (ms): 14638.2 | learning rate: 2.035E-05 | global batch size: 32 | lm loss: 6.536213E+00 | loss scale: 16384.0 | grad norm: 64411.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3774/ 159576 | consumed samples: 73520 | elapsed time per iteration (ms): 14533.1 | learning rate: 2.036E-05 | global batch size: 32 | lm loss: 6.477271E+00 | loss scale: 16384.0 | grad norm: 79531.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3775/ 159576 | consumed samples: 73552 | elapsed time per iteration (ms): 14956.5 | learning rate: 2.037E-05 | global batch size: 32 | lm loss: 6.364020E+00 | loss scale: 16384.0 | grad norm: 72312.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3776/ 159576 | consumed samples: 73584 | elapsed time per iteration (ms): 14572.0 | learning rate: 2.037E-05 | global batch size: 32 | lm loss: 6.331044E+00 | loss scale: 16384.0 | grad norm: 84164.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3777/ 159576 | consumed samples: 73616 | elapsed time per iteration (ms): 14594.9 | learning rate: 2.038E-05 | global batch size: 32 | lm loss: 6.512950E+00 | loss scale: 16384.0 | grad norm: 77822.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3778/ 159576 | consumed samples: 73648 | elapsed time per iteration (ms): 14607.5 | learning rate: 2.039E-05 | global batch size: 32 | lm loss: 6.549839E+00 | loss scale: 16384.0 | grad norm: 66443.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3779/ 159576 | consumed samples: 73680 | elapsed time per iteration (ms): 14999.4 | learning rate: 2.040E-05 | global batch size: 32 | lm loss: 6.475536E+00 | loss scale: 16384.0 | grad norm: 88572.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3780/ 159576 | consumed samples: 73712 | elapsed time per iteration (ms): 14681.3 | learning rate: 2.041E-05 | global batch size: 32 | lm loss: 6.548042E+00 | loss scale: 16384.0 | grad norm: 74648.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3781/ 159576 | consumed samples: 73744 | elapsed time per iteration (ms): 14610.5 | learning rate: 2.042E-05 | global batch size: 32 | lm loss: 6.445394E+00 | loss scale: 16384.0 | grad norm: 79663.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3782/ 159576 | consumed samples: 73776 | elapsed time per iteration (ms): 14624.0 | learning rate: 2.043E-05 | global batch size: 32 | lm loss: 6.496744E+00 | loss scale: 16384.0 | grad norm: 77740.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3783/ 159576 | consumed samples: 73808 | elapsed time per iteration (ms): 15155.7 | learning rate: 2.044E-05 | global batch size: 32 | lm loss: 6.402834E+00 | loss scale: 16384.0 | grad norm: 74857.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3784/ 159576 | consumed samples: 73840 | elapsed time per iteration (ms): 14584.9 | learning rate: 2.045E-05 | global batch size: 32 | lm loss: 6.375038E+00 | loss scale: 16384.0 | grad norm: 86117.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3785/ 159576 | consumed samples: 73872 | elapsed time per iteration (ms): 14634.8 | learning rate: 2.045E-05 | global batch size: 32 | lm loss: 6.507965E+00 | loss scale: 16384.0 | grad norm: 78691.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3786/ 159576 | consumed samples: 73904 | elapsed time per iteration (ms): 14635.7 | learning rate: 2.046E-05 | global batch size: 32 | lm loss: 6.375463E+00 | loss scale: 16384.0 | grad norm: 105222.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3787/ 159576 | consumed samples: 73936 | elapsed time per iteration (ms): 14981.3 | learning rate: 2.047E-05 | global batch size: 32 | lm loss: 6.494486E+00 | loss scale: 16384.0 | grad norm: 70745.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3788/ 159576 | consumed samples: 73968 | elapsed time per iteration (ms): 14576.6 | learning rate: 2.048E-05 | global batch size: 32 | lm loss: 6.350873E+00 | loss scale: 16384.0 | grad norm: 81350.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3789/ 159576 | consumed samples: 74000 | elapsed time per iteration (ms): 14674.5 | learning rate: 2.049E-05 | global batch size: 32 | lm loss: 6.467069E+00 | loss scale: 16384.0 | grad norm: 84086.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3790/ 159576 | consumed samples: 74032 | elapsed time per iteration (ms): 14585.2 | learning rate: 2.050E-05 | global batch size: 32 | lm loss: 6.420381E+00 | loss scale: 16384.0 | grad norm: 79517.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3791/ 159576 | consumed samples: 74064 | elapsed time per iteration (ms): 14845.4 | learning rate: 2.051E-05 | global batch size: 32 | lm loss: 6.528859E+00 | loss scale: 16384.0 | grad norm: 87747.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3792/ 159576 | consumed samples: 74096 | elapsed time per iteration (ms): 14671.9 | learning rate: 2.052E-05 | global batch size: 32 | lm loss: 6.445452E+00 | loss scale: 16384.0 | grad norm: 76185.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3793/ 159576 | consumed samples: 74128 | elapsed time per iteration (ms): 14614.2 | learning rate: 2.053E-05 | global batch size: 32 | lm loss: 6.579043E+00 | loss scale: 16384.0 | grad norm: 85891.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3794/ 159576 | consumed samples: 74160 | elapsed time per iteration (ms): 14636.7 | learning rate: 2.053E-05 | global batch size: 32 | lm loss: 6.481782E+00 | loss scale: 16384.0 | grad norm: 62633.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3795/ 159576 | consumed samples: 74192 | elapsed time per iteration (ms): 14963.5 | learning rate: 2.054E-05 | global batch size: 32 | lm loss: 6.517486E+00 | loss scale: 16384.0 | grad norm: 67403.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3796/ 159576 | consumed samples: 74224 | elapsed time per iteration (ms): 14620.1 | learning rate: 2.055E-05 | global batch size: 32 | lm loss: 6.417095E+00 | loss scale: 16384.0 | grad norm: 62157.167 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3797/ 159576 | consumed samples: 74256 | elapsed time per iteration (ms): 14620.8 | learning rate: 2.056E-05 | global batch size: 32 | lm loss: 6.419306E+00 | loss scale: 16384.0 | grad norm: 73456.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3798/ 159576 | consumed samples: 74288 | elapsed time per iteration (ms): 14577.9 | learning rate: 2.057E-05 | global batch size: 32 | lm loss: 6.487021E+00 | loss scale: 16384.0 | grad norm: 67613.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3799/ 159576 | consumed samples: 74320 | elapsed time per iteration (ms): 14963.8 | learning rate: 2.058E-05 | global batch size: 32 | lm loss: 6.459682E+00 | loss scale: 16384.0 | grad norm: 73515.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3800/ 159576 | consumed samples: 74352 | elapsed time per iteration (ms): 14567.9 | learning rate: 2.059E-05 | global batch size: 32 | lm loss: 6.321566E+00 | loss scale: 16384.0 | grad norm: 77546.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3801/ 159576 | consumed samples: 74384 | elapsed time per iteration (ms): 14600.7 | learning rate: 2.060E-05 | global batch size: 32 | lm loss: 6.582398E+00 | loss scale: 16384.0 | grad norm: 78424.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3802/ 159576 | consumed samples: 74416 | elapsed time per iteration (ms): 14644.4 | learning rate: 2.061E-05 | global batch size: 32 | lm loss: 6.394701E+00 | loss scale: 16384.0 | grad norm: 82174.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3803/ 159576 | consumed samples: 74448 | elapsed time per iteration (ms): 14905.7 | learning rate: 2.061E-05 | global batch size: 32 | lm loss: 6.388845E+00 | loss scale: 16384.0 | grad norm: 67050.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3804/ 159576 | consumed samples: 74480 | elapsed time per iteration (ms): 14636.0 | learning rate: 2.062E-05 | global batch size: 32 | lm loss: 6.513092E+00 | loss scale: 16384.0 | grad norm: 118423.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3805/ 159576 | consumed samples: 74512 | elapsed time per iteration (ms): 14511.9 | learning rate: 2.063E-05 | global batch size: 32 | lm loss: 6.418696E+00 | loss scale: 16384.0 | grad norm: 71096.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3806/ 159576 | consumed samples: 74544 | elapsed time per iteration (ms): 14523.9 | learning rate: 2.064E-05 | global batch size: 32 | lm loss: 6.286570E+00 | loss scale: 16384.0 | grad norm: 93004.901 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3807/ 159576 | consumed samples: 74576 | elapsed time per iteration (ms): 14509.8 | learning rate: 2.065E-05 | global batch size: 32 | lm loss: 6.565314E+00 | loss scale: 16384.0 | grad norm: 76207.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3808/ 159576 | consumed samples: 74608 | elapsed time per iteration (ms): 15001.7 | learning rate: 2.066E-05 | global batch size: 32 | lm loss: 6.597963E+00 | loss scale: 16384.0 | grad norm: 136405.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3809/ 159576 | consumed samples: 74640 | elapsed time per iteration (ms): 14540.5 | learning rate: 2.067E-05 | global batch size: 32 | lm loss: 6.619783E+00 | loss scale: 16384.0 | grad norm: 75270.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3810/ 159576 | consumed samples: 74672 | elapsed time per iteration (ms): 14582.3 | learning rate: 2.068E-05 | global batch size: 32 | lm loss: 6.406981E+00 | loss scale: 16384.0 | grad norm: 81052.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3811/ 159576 | consumed samples: 74704 | elapsed time per iteration (ms): 14512.1 | learning rate: 2.068E-05 | global batch size: 32 | lm loss: 6.487488E+00 | loss scale: 16384.0 | grad norm: 87400.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3812/ 159576 | consumed samples: 74736 | elapsed time per iteration (ms): 14767.4 | learning rate: 2.069E-05 | global batch size: 32 | lm loss: 6.416305E+00 | loss scale: 16384.0 | grad norm: 104809.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3813/ 159576 | consumed samples: 74768 | elapsed time per iteration (ms): 14457.6 | learning rate: 2.070E-05 | global batch size: 32 | lm loss: 6.405777E+00 | loss scale: 16384.0 | grad norm: 79282.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3814/ 159576 | consumed samples: 74800 | elapsed time per iteration (ms): 14520.7 | learning rate: 2.071E-05 | global batch size: 32 | lm loss: 6.435395E+00 | loss scale: 16384.0 | grad norm: 75788.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3815/ 159576 | consumed samples: 74832 | elapsed time per iteration (ms): 14520.3 | learning rate: 2.072E-05 | global batch size: 32 | lm loss: 6.324138E+00 | loss scale: 16384.0 | grad norm: 77448.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3816/ 159576 | consumed samples: 74864 | elapsed time per iteration (ms): 14756.0 | learning rate: 2.073E-05 | global batch size: 32 | lm loss: 6.479269E+00 | loss scale: 16384.0 | grad norm: 80928.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3817/ 159576 | consumed samples: 74896 | elapsed time per iteration (ms): 14631.8 | learning rate: 2.074E-05 | global batch size: 32 | lm loss: 6.448977E+00 | loss scale: 16384.0 | grad norm: 81667.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3818/ 159576 | consumed samples: 74928 | elapsed time per iteration (ms): 14631.1 | learning rate: 2.075E-05 | global batch size: 32 | lm loss: 6.550106E+00 | loss scale: 16384.0 | grad norm: 65592.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3819/ 159576 | consumed samples: 74960 | elapsed time per iteration (ms): 14596.0 | learning rate: 2.076E-05 | global batch size: 32 | lm loss: 6.589513E+00 | loss scale: 16384.0 | grad norm: 93606.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3820/ 159576 | consumed samples: 74992 | elapsed time per iteration (ms): 14800.0 | learning rate: 2.076E-05 | global batch size: 32 | lm loss: 6.472552E+00 | loss scale: 16384.0 | grad norm: 63974.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3821/ 159576 | consumed samples: 75024 | elapsed time per iteration (ms): 14588.9 | learning rate: 2.077E-05 | global batch size: 32 | lm loss: 6.366886E+00 | loss scale: 16384.0 | grad norm: 87736.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3822/ 159576 | consumed samples: 75056 | elapsed time per iteration (ms): 14606.9 | learning rate: 2.078E-05 | global batch size: 32 | lm loss: 6.523769E+00 | loss scale: 16384.0 | grad norm: 81803.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3823/ 159576 | consumed samples: 75088 | elapsed time per iteration (ms): 14588.5 | learning rate: 2.079E-05 | global batch size: 32 | lm loss: 6.495326E+00 | loss scale: 16384.0 | grad norm: 63058.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3824/ 159576 | consumed samples: 75120 | elapsed time per iteration (ms): 14986.3 | learning rate: 2.080E-05 | global batch size: 32 | lm loss: 6.557096E+00 | loss scale: 16384.0 | grad norm: 80592.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3825/ 159576 | consumed samples: 75152 | elapsed time per iteration (ms): 14684.5 | learning rate: 2.081E-05 | global batch size: 32 | lm loss: 6.436917E+00 | loss scale: 16384.0 | grad norm: 70196.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3826/ 159576 | consumed samples: 75184 | elapsed time per iteration (ms): 14627.4 | learning rate: 2.082E-05 | global batch size: 32 | lm loss: 6.413104E+00 | loss scale: 16384.0 | grad norm: 81085.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 17:07:03] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 17:07:03] PULSE: tr8-104B is running for 11:14:52 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 3827/ 159576 | consumed samples: 75216 | elapsed time per iteration (ms): 14531.9 | learning rate: 2.083E-05 | global batch size: 32 | lm loss: 6.427704E+00 | loss scale: 16384.0 | grad norm: 68943.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3828/ 159576 | consumed samples: 75248 | elapsed time per iteration (ms): 14988.1 | learning rate: 2.084E-05 | global batch size: 32 | lm loss: 6.347779E+00 | loss scale: 16384.0 | grad norm: 64095.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3829/ 159576 | consumed samples: 75280 | elapsed time per iteration (ms): 14665.9 | learning rate: 2.084E-05 | global batch size: 32 | lm loss: 6.411919E+00 | loss scale: 16384.0 | grad norm: 82008.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3830/ 159576 | consumed samples: 75312 | elapsed time per iteration (ms): 14539.9 | learning rate: 2.085E-05 | global batch size: 32 | lm loss: 6.458866E+00 | loss scale: 16384.0 | grad norm: 67971.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3831/ 159576 | consumed samples: 75344 | elapsed time per iteration (ms): 14600.2 | learning rate: 2.086E-05 | global batch size: 32 | lm loss: 6.450158E+00 | loss scale: 16384.0 | grad norm: 59376.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3832/ 159576 | consumed samples: 75376 | elapsed time per iteration (ms): 14931.8 | learning rate: 2.087E-05 | global batch size: 32 | lm loss: 6.537256E+00 | loss scale: 16384.0 | grad norm: 77538.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3833/ 159576 | consumed samples: 75408 | elapsed time per iteration (ms): 14592.6 | learning rate: 2.088E-05 | global batch size: 32 | lm loss: 6.392985E+00 | loss scale: 16384.0 | grad norm: 84275.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3834/ 159576 | consumed samples: 75440 | elapsed time per iteration (ms): 14616.6 | learning rate: 2.089E-05 | global batch size: 32 | lm loss: 6.512251E+00 | loss scale: 16384.0 | grad norm: 80167.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3835/ 159576 | consumed samples: 75472 | elapsed time per iteration (ms): 14584.0 | learning rate: 2.090E-05 | global batch size: 32 | lm loss: 6.467295E+00 | loss scale: 16384.0 | grad norm: 85124.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3836/ 159576 | consumed samples: 75504 | elapsed time per iteration (ms): 14844.3 | learning rate: 2.091E-05 | global batch size: 32 | lm loss: 6.514040E+00 | loss scale: 16384.0 | grad norm: 71539.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3837/ 159576 | consumed samples: 75536 | elapsed time per iteration (ms): 14618.8 | learning rate: 2.092E-05 | global batch size: 32 | lm loss: 6.519591E+00 | loss scale: 16384.0 | grad norm: 89173.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3838/ 159576 | consumed samples: 75568 | elapsed time per iteration (ms): 14566.0 | learning rate: 2.092E-05 | global batch size: 32 | lm loss: 6.447284E+00 | loss scale: 16384.0 | grad norm: 86030.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3839/ 159576 | consumed samples: 75600 | elapsed time per iteration (ms): 14636.3 | learning rate: 2.093E-05 | global batch size: 32 | lm loss: 6.369718E+00 | loss scale: 16384.0 | grad norm: 66275.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3840/ 159576 | consumed samples: 75632 | elapsed time per iteration (ms): 14897.9 | learning rate: 2.094E-05 | global batch size: 32 | lm loss: 6.467171E+00 | loss scale: 16384.0 | grad norm: 82043.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3841/ 159576 | consumed samples: 75664 | elapsed time per iteration (ms): 14554.8 | learning rate: 2.095E-05 | global batch size: 32 | lm loss: 6.458669E+00 | loss scale: 16384.0 | grad norm: 73761.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3842/ 159576 | consumed samples: 75696 | elapsed time per iteration (ms): 14564.2 | learning rate: 2.096E-05 | global batch size: 32 | lm loss: 6.516797E+00 | loss scale: 16384.0 | grad norm: 83647.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3843/ 159576 | consumed samples: 75728 | elapsed time per iteration (ms): 14464.9 | learning rate: 2.097E-05 | global batch size: 32 | lm loss: 6.381551E+00 | loss scale: 16384.0 | grad norm: 58297.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3844/ 159576 | consumed samples: 75760 | elapsed time per iteration (ms): 14942.4 | learning rate: 2.098E-05 | global batch size: 32 | lm loss: 6.471825E+00 | loss scale: 16384.0 | grad norm: 82881.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3845/ 159576 | consumed samples: 75792 | elapsed time per iteration (ms): 14531.3 | learning rate: 2.099E-05 | global batch size: 32 | lm loss: 6.528457E+00 | loss scale: 16384.0 | grad norm: 67296.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3846/ 159576 | consumed samples: 75824 | elapsed time per iteration (ms): 14601.9 | learning rate: 2.100E-05 | global batch size: 32 | lm loss: 6.408827E+00 | loss scale: 16384.0 | grad norm: 67512.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3847/ 159576 | consumed samples: 75856 | elapsed time per iteration (ms): 14580.2 | learning rate: 2.100E-05 | global batch size: 32 | lm loss: 6.440091E+00 | loss scale: 16384.0 | grad norm: 78400.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3848/ 159576 | consumed samples: 75888 | elapsed time per iteration (ms): 14911.9 | learning rate: 2.101E-05 | global batch size: 32 | lm loss: 6.374573E+00 | loss scale: 16384.0 | grad norm: 85886.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3849/ 159576 | consumed samples: 75920 | elapsed time per iteration (ms): 14768.3 | learning rate: 2.102E-05 | global batch size: 32 | lm loss: 6.529835E+00 | loss scale: 16384.0 | grad norm: 71394.057 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3850/ 159576 | consumed samples: 75952 | elapsed time per iteration (ms): 14553.3 | learning rate: 2.103E-05 | global batch size: 32 | lm loss: 6.455585E+00 | loss scale: 16384.0 | grad norm: 67772.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3851/ 159576 | consumed samples: 75984 | elapsed time per iteration (ms): 14574.9 | learning rate: 2.104E-05 | global batch size: 32 | lm loss: 6.428284E+00 | loss scale: 16384.0 | grad norm: 110864.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3852/ 159576 | consumed samples: 76016 | elapsed time per iteration (ms): 14592.6 | learning rate: 2.105E-05 | global batch size: 32 | lm loss: 6.457644E+00 | loss scale: 16384.0 | grad norm: 73499.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3853/ 159576 | consumed samples: 76048 | elapsed time per iteration (ms): 14780.7 | learning rate: 2.106E-05 | global batch size: 32 | lm loss: 6.459057E+00 | loss scale: 16384.0 | grad norm: 71503.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3854/ 159576 | consumed samples: 76080 | elapsed time per iteration (ms): 14631.9 | learning rate: 2.107E-05 | global batch size: 32 | lm loss: 6.522111E+00 | loss scale: 16384.0 | grad norm: 73205.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3855/ 159576 | consumed samples: 76112 | elapsed time per iteration (ms): 14685.7 | learning rate: 2.108E-05 | global batch size: 32 | lm loss: 6.444643E+00 | loss scale: 16384.0 | grad norm: 70169.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3856/ 159576 | consumed samples: 76144 | elapsed time per iteration (ms): 14534.2 | learning rate: 2.108E-05 | global batch size: 32 | lm loss: 6.392300E+00 | loss scale: 16384.0 | grad norm: 81224.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3857/ 159576 | consumed samples: 76176 | elapsed time per iteration (ms): 14734.9 | learning rate: 2.109E-05 | global batch size: 32 | lm loss: 6.474737E+00 | loss scale: 16384.0 | grad norm: 76429.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3858/ 159576 | consumed samples: 76208 | elapsed time per iteration (ms): 14589.1 | learning rate: 2.110E-05 | global batch size: 32 | lm loss: 6.481500E+00 | loss scale: 16384.0 | grad norm: 76288.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3859/ 159576 | consumed samples: 76240 | elapsed time per iteration (ms): 14536.6 | learning rate: 2.111E-05 | global batch size: 32 | lm loss: 6.504058E+00 | loss scale: 16384.0 | grad norm: 75104.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3860/ 159576 | consumed samples: 76272 | elapsed time per iteration (ms): 14557.4 | learning rate: 2.112E-05 | global batch size: 32 | lm loss: 6.616935E+00 | loss scale: 16384.0 | grad norm: 73471.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3861/ 159576 | consumed samples: 76304 | elapsed time per iteration (ms): 14996.3 | learning rate: 2.113E-05 | global batch size: 32 | lm loss: 6.437632E+00 | loss scale: 16384.0 | grad norm: 100626.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3862/ 159576 | consumed samples: 76336 | elapsed time per iteration (ms): 14610.8 | learning rate: 2.114E-05 | global batch size: 32 | lm loss: 6.358921E+00 | loss scale: 16384.0 | grad norm: 84367.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3863/ 159576 | consumed samples: 76368 | elapsed time per iteration (ms): 14574.0 | learning rate: 2.115E-05 | global batch size: 32 | lm loss: 6.489450E+00 | loss scale: 16384.0 | grad norm: 111308.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3864/ 159576 | consumed samples: 76400 | elapsed time per iteration (ms): 14585.8 | learning rate: 2.116E-05 | global batch size: 32 | lm loss: 6.579299E+00 | loss scale: 16384.0 | grad norm: 71685.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3865/ 159576 | consumed samples: 76432 | elapsed time per iteration (ms): 14801.5 | learning rate: 2.116E-05 | global batch size: 32 | lm loss: 6.356242E+00 | loss scale: 16384.0 | grad norm: 68636.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3866/ 159576 | consumed samples: 76464 | elapsed time per iteration (ms): 14581.8 | learning rate: 2.117E-05 | global batch size: 32 | lm loss: 6.583051E+00 | loss scale: 16384.0 | grad norm: 83498.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3867/ 159576 | consumed samples: 76496 | elapsed time per iteration (ms): 14548.1 | learning rate: 2.118E-05 | global batch size: 32 | lm loss: 6.414474E+00 | loss scale: 16384.0 | grad norm: 70120.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3868/ 159576 | consumed samples: 76528 | elapsed time per iteration (ms): 14581.2 | learning rate: 2.119E-05 | global batch size: 32 | lm loss: 6.383676E+00 | loss scale: 16384.0 | grad norm: 65625.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3869/ 159576 | consumed samples: 76560 | elapsed time per iteration (ms): 14975.0 | learning rate: 2.120E-05 | global batch size: 32 | lm loss: 6.553302E+00 | loss scale: 16384.0 | grad norm: 78443.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3870/ 159576 | consumed samples: 76592 | elapsed time per iteration (ms): 14654.1 | learning rate: 2.121E-05 | global batch size: 32 | lm loss: 6.525763E+00 | loss scale: 16384.0 | grad norm: 74575.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3871/ 159576 | consumed samples: 76624 | elapsed time per iteration (ms): 14658.5 | learning rate: 2.122E-05 | global batch size: 32 | lm loss: 6.416959E+00 | loss scale: 16384.0 | grad norm: 61001.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3872/ 159576 | consumed samples: 76656 | elapsed time per iteration (ms): 14544.3 | learning rate: 2.123E-05 | global batch size: 32 | lm loss: 6.516649E+00 | loss scale: 16384.0 | grad norm: 76582.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3873/ 159576 | consumed samples: 76688 | elapsed time per iteration (ms): 14961.2 | learning rate: 2.124E-05 | global batch size: 32 | lm loss: 6.532383E+00 | loss scale: 16384.0 | grad norm: 98540.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3874/ 159576 | consumed samples: 76720 | elapsed time per iteration (ms): 14595.7 | learning rate: 2.124E-05 | global batch size: 32 | lm loss: 6.589262E+00 | loss scale: 16384.0 | grad norm: 90020.937 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3875/ 159576 | consumed samples: 76752 | elapsed time per iteration (ms): 14549.8 | learning rate: 2.125E-05 | global batch size: 32 | lm loss: 6.475612E+00 | loss scale: 16384.0 | grad norm: 71253.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3876/ 159576 | consumed samples: 76784 | elapsed time per iteration (ms): 14539.7 | learning rate: 2.126E-05 | global batch size: 32 | lm loss: 6.477540E+00 | loss scale: 16384.0 | grad norm: 113904.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3877/ 159576 | consumed samples: 76816 | elapsed time per iteration (ms): 14922.4 | learning rate: 2.127E-05 | global batch size: 32 | lm loss: 6.475825E+00 | loss scale: 16384.0 | grad norm: 59736.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3878/ 159576 | consumed samples: 76848 | elapsed time per iteration (ms): 14676.0 | learning rate: 2.128E-05 | global batch size: 32 | lm loss: 6.477038E+00 | loss scale: 16384.0 | grad norm: 73926.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3879/ 159576 | consumed samples: 76880 | elapsed time per iteration (ms): 14505.4 | learning rate: 2.129E-05 | global batch size: 32 | lm loss: 6.577363E+00 | loss scale: 16384.0 | grad norm: 65273.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3880/ 159576 | consumed samples: 76912 | elapsed time per iteration (ms): 14525.2 | learning rate: 2.130E-05 | global batch size: 32 | lm loss: 6.431276E+00 | loss scale: 16384.0 | grad norm: 62353.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3881/ 159576 | consumed samples: 76944 | elapsed time per iteration (ms): 14918.9 | learning rate: 2.131E-05 | global batch size: 32 | lm loss: 6.471975E+00 | loss scale: 16384.0 | grad norm: 80402.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3882/ 159576 | consumed samples: 76976 | elapsed time per iteration (ms): 14543.5 | learning rate: 2.132E-05 | global batch size: 32 | lm loss: 6.481179E+00 | loss scale: 16384.0 | grad norm: 59241.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3883/ 159576 | consumed samples: 77008 | elapsed time per iteration (ms): 14519.1 | learning rate: 2.132E-05 | global batch size: 32 | lm loss: 6.356431E+00 | loss scale: 16384.0 | grad norm: 66124.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3884/ 159576 | consumed samples: 77040 | elapsed time per iteration (ms): 14635.6 | learning rate: 2.133E-05 | global batch size: 32 | lm loss: 7.171796E+00 | loss scale: 16384.0 | grad norm: 628102.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3885/ 159576 | consumed samples: 77072 | elapsed time per iteration (ms): 14877.6 | learning rate: 2.134E-05 | global batch size: 32 | lm loss: 7.122965E+00 | loss scale: 16384.0 | grad norm: 105361.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3886/ 159576 | consumed samples: 77104 | elapsed time per iteration (ms): 14581.7 | learning rate: 2.135E-05 | global batch size: 32 | lm loss: 6.781033E+00 | loss scale: 16384.0 | grad norm: 90805.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3887/ 159576 | consumed samples: 77136 | elapsed time per iteration (ms): 14580.5 | learning rate: 2.136E-05 | global batch size: 32 | lm loss: 6.824611E+00 | loss scale: 16384.0 | grad norm: 128888.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3888/ 159576 | consumed samples: 77168 | elapsed time per iteration (ms): 14468.4 | learning rate: 2.137E-05 | global batch size: 32 | lm loss: 6.773994E+00 | loss scale: 16384.0 | grad norm: 67441.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3889/ 159576 | consumed samples: 77200 | elapsed time per iteration (ms): 14934.3 | learning rate: 2.138E-05 | global batch size: 32 | lm loss: 6.845183E+00 | loss scale: 16384.0 | grad norm: 171660.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3890/ 159576 | consumed samples: 77232 | elapsed time per iteration (ms): 14531.8 | learning rate: 2.139E-05 | global batch size: 32 | lm loss: 6.803124E+00 | loss scale: 16384.0 | grad norm: 100767.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3891/ 159576 | consumed samples: 77264 | elapsed time per iteration (ms): 14568.7 | learning rate: 2.139E-05 | global batch size: 32 | lm loss: 6.825951E+00 | loss scale: 16384.0 | grad norm: 84326.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3892/ 159576 | consumed samples: 77296 | elapsed time per iteration (ms): 14543.8 | learning rate: 2.140E-05 | global batch size: 32 | lm loss: 6.734772E+00 | loss scale: 16384.0 | grad norm: 87236.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3893/ 159576 | consumed samples: 77328 | elapsed time per iteration (ms): 14607.7 | learning rate: 2.141E-05 | global batch size: 32 | lm loss: 6.789660E+00 | loss scale: 16384.0 | grad norm: 88054.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3894/ 159576 | consumed samples: 77360 | elapsed time per iteration (ms): 14920.9 | learning rate: 2.142E-05 | global batch size: 32 | lm loss: 6.710454E+00 | loss scale: 16384.0 | grad norm: 182978.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3895/ 159576 | consumed samples: 77392 | elapsed time per iteration (ms): 14510.2 | learning rate: 2.143E-05 | global batch size: 32 | lm loss: 6.691602E+00 | loss scale: 16384.0 | grad norm: 119037.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3896/ 159576 | consumed samples: 77424 | elapsed time per iteration (ms): 14496.2 | learning rate: 2.144E-05 | global batch size: 32 | lm loss: 6.739342E+00 | loss scale: 16384.0 | grad norm: 97461.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3897/ 159576 | consumed samples: 77456 | elapsed time per iteration (ms): 14526.7 | learning rate: 2.145E-05 | global batch size: 32 | lm loss: 6.818674E+00 | loss scale: 16384.0 | grad norm: 86334.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3898/ 159576 | consumed samples: 77488 | elapsed time per iteration (ms): 14792.9 | learning rate: 2.146E-05 | global batch size: 32 | lm loss: 6.717194E+00 | loss scale: 16384.0 | grad norm: 113951.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3899/ 159576 | consumed samples: 77520 | elapsed time per iteration (ms): 14491.5 | learning rate: 2.147E-05 | global batch size: 32 | lm loss: 6.714782E+00 | loss scale: 16384.0 | grad norm: 99766.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3900/ 159576 | consumed samples: 77552 | elapsed time per iteration (ms): 14584.1 | learning rate: 2.147E-05 | global batch size: 32 | lm loss: 6.659179E+00 | loss scale: 16384.0 | grad norm: 89663.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3901/ 159576 | consumed samples: 77584 | elapsed time per iteration (ms): 14629.2 | learning rate: 2.148E-05 | global batch size: 32 | lm loss: 6.615579E+00 | loss scale: 16384.0 | grad norm: 68957.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3902/ 159576 | consumed samples: 77616 | elapsed time per iteration (ms): 14617.9 | learning rate: 2.149E-05 | global batch size: 32 | lm loss: 6.606854E+00 | loss scale: 16384.0 | grad norm: 99968.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3903/ 159576 | consumed samples: 77648 | elapsed time per iteration (ms): 14554.1 | learning rate: 2.150E-05 | global batch size: 32 | lm loss: 6.537298E+00 | loss scale: 16384.0 | grad norm: 67921.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3904/ 159576 | consumed samples: 77680 | elapsed time per iteration (ms): 14545.4 | learning rate: 2.151E-05 | global batch size: 32 | lm loss: 6.606940E+00 | loss scale: 16384.0 | grad norm: 145573.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3905/ 159576 | consumed samples: 77712 | elapsed time per iteration (ms): 14521.9 | learning rate: 2.152E-05 | global batch size: 32 | lm loss: 6.625298E+00 | loss scale: 16384.0 | grad norm: 96778.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3906/ 159576 | consumed samples: 77744 | elapsed time per iteration (ms): 14699.2 | learning rate: 2.153E-05 | global batch size: 32 | lm loss: 6.624491E+00 | loss scale: 16384.0 | grad norm: 92738.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3907/ 159576 | consumed samples: 77776 | elapsed time per iteration (ms): 14558.6 | learning rate: 2.154E-05 | global batch size: 32 | lm loss: 6.825802E+00 | loss scale: 16384.0 | grad norm: 119492.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3908/ 159576 | consumed samples: 77808 | elapsed time per iteration (ms): 14547.7 | learning rate: 2.155E-05 | global batch size: 32 | lm loss: 6.591653E+00 | loss scale: 16384.0 | grad norm: 78761.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3909/ 159576 | consumed samples: 77840 | elapsed time per iteration (ms): 14554.0 | learning rate: 2.155E-05 | global batch size: 32 | lm loss: 6.567001E+00 | loss scale: 16384.0 | grad norm: 147075.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3910/ 159576 | consumed samples: 77872 | elapsed time per iteration (ms): 15013.4 | learning rate: 2.156E-05 | global batch size: 32 | lm loss: 6.787440E+00 | loss scale: 16384.0 | grad norm: 142314.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3911/ 159576 | consumed samples: 77904 | elapsed time per iteration (ms): 14566.2 | learning rate: 2.157E-05 | global batch size: 32 | lm loss: 6.525432E+00 | loss scale: 16384.0 | grad norm: 87369.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3912/ 159576 | consumed samples: 77936 | elapsed time per iteration (ms): 14516.0 | learning rate: 2.158E-05 | global batch size: 32 | lm loss: 6.615817E+00 | loss scale: 16384.0 | grad norm: 83904.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3913/ 159576 | consumed samples: 77968 | elapsed time per iteration (ms): 14525.8 | learning rate: 2.159E-05 | global batch size: 32 | lm loss: 6.564670E+00 | loss scale: 16384.0 | grad norm: 97516.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3914/ 159576 | consumed samples: 78000 | elapsed time per iteration (ms): 15027.0 | learning rate: 2.160E-05 | global batch size: 32 | lm loss: 6.400544E+00 | loss scale: 16384.0 | grad norm: 92743.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3915/ 159576 | consumed samples: 78032 | elapsed time per iteration (ms): 14573.6 | learning rate: 2.161E-05 | global batch size: 32 | lm loss: 6.603245E+00 | loss scale: 16384.0 | grad norm: 106541.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3916/ 159576 | consumed samples: 78064 | elapsed time per iteration (ms): 14538.9 | learning rate: 2.162E-05 | global batch size: 32 | lm loss: 6.560642E+00 | loss scale: 16384.0 | grad norm: 71313.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3917/ 159576 | consumed samples: 78096 | elapsed time per iteration (ms): 14550.2 | learning rate: 2.163E-05 | global batch size: 32 | lm loss: 6.578140E+00 | loss scale: 16384.0 | grad norm: 83812.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3918/ 159576 | consumed samples: 78128 | elapsed time per iteration (ms): 14857.6 | learning rate: 2.163E-05 | global batch size: 32 | lm loss: 6.583351E+00 | loss scale: 16384.0 | grad norm: 69616.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3919/ 159576 | consumed samples: 78160 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.164E-05 | global batch size: 32 | lm loss: 6.595952E+00 | loss scale: 16384.0 | grad norm: 83133.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3920/ 159576 | consumed samples: 78192 | elapsed time per iteration (ms): 14502.7 | learning rate: 2.165E-05 | global batch size: 32 | lm loss: 6.645111E+00 | loss scale: 16384.0 | grad norm: 69570.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3921/ 159576 | consumed samples: 78224 | elapsed time per iteration (ms): 14498.8 | learning rate: 2.166E-05 | global batch size: 32 | lm loss: 6.553501E+00 | loss scale: 16384.0 | grad norm: 142896.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3922/ 159576 | consumed samples: 78256 | elapsed time per iteration (ms): 14842.1 | learning rate: 2.167E-05 | global batch size: 32 | lm loss: 6.687614E+00 | loss scale: 16384.0 | grad norm: 107346.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3923/ 159576 | consumed samples: 78288 | elapsed time per iteration (ms): 14567.6 | learning rate: 2.168E-05 | global batch size: 32 | lm loss: 6.764112E+00 | loss scale: 16384.0 | grad norm: 75484.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3924/ 159576 | consumed samples: 78320 | elapsed time per iteration (ms): 14603.6 | learning rate: 2.169E-05 | global batch size: 32 | lm loss: 6.384696E+00 | loss scale: 16384.0 | grad norm: 91570.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3925/ 159576 | consumed samples: 78352 | elapsed time per iteration (ms): 14494.1 | learning rate: 2.170E-05 | global batch size: 32 | lm loss: 6.148740E+00 | loss scale: 16384.0 | grad norm: 66094.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3926/ 159576 | consumed samples: 78384 | elapsed time per iteration (ms): 14880.0 | learning rate: 2.171E-05 | global batch size: 32 | lm loss: 6.492467E+00 | loss scale: 16384.0 | grad norm: 95980.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3927/ 159576 | consumed samples: 78416 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.171E-05 | global batch size: 32 | lm loss: 6.634668E+00 | loss scale: 16384.0 | grad norm: 102240.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3928/ 159576 | consumed samples: 78448 | elapsed time per iteration (ms): 14524.9 | learning rate: 2.172E-05 | global batch size: 32 | lm loss: 6.542571E+00 | loss scale: 16384.0 | grad norm: 78190.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3929/ 159576 | consumed samples: 78480 | elapsed time per iteration (ms): 14519.9 | learning rate: 2.173E-05 | global batch size: 32 | lm loss: 6.546354E+00 | loss scale: 16384.0 | grad norm: 69181.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3930/ 159576 | consumed samples: 78512 | elapsed time per iteration (ms): 14848.7 | learning rate: 2.174E-05 | global batch size: 32 | lm loss: 6.556016E+00 | loss scale: 16384.0 | grad norm: 166890.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3931/ 159576 | consumed samples: 78544 | elapsed time per iteration (ms): 14630.3 | learning rate: 2.175E-05 | global batch size: 32 | lm loss: 6.575625E+00 | loss scale: 16384.0 | grad norm: 67026.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3932/ 159576 | consumed samples: 78576 | elapsed time per iteration (ms): 14503.2 | learning rate: 2.176E-05 | global batch size: 32 | lm loss: 6.528583E+00 | loss scale: 16384.0 | grad norm: 65300.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3933/ 159576 | consumed samples: 78608 | elapsed time per iteration (ms): 14533.6 | learning rate: 2.177E-05 | global batch size: 32 | lm loss: 6.571996E+00 | loss scale: 16384.0 | grad norm: 61530.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3934/ 159576 | consumed samples: 78640 | elapsed time per iteration (ms): 14528.2 | learning rate: 2.178E-05 | global batch size: 32 | lm loss: 6.524823E+00 | loss scale: 16384.0 | grad norm: 58107.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3935/ 159576 | consumed samples: 78672 | elapsed time per iteration (ms): 14801.4 | learning rate: 2.179E-05 | global batch size: 32 | lm loss: 6.627916E+00 | loss scale: 16384.0 | grad norm: 64798.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3936/ 159576 | consumed samples: 78704 | elapsed time per iteration (ms): 14509.3 | learning rate: 2.179E-05 | global batch size: 32 | lm loss: 6.511620E+00 | loss scale: 16384.0 | grad norm: 59258.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3937/ 159576 | consumed samples: 78736 | elapsed time per iteration (ms): 14529.7 | learning rate: 2.180E-05 | global batch size: 32 | lm loss: 6.414696E+00 | loss scale: 16384.0 | grad norm: 75598.973 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3938/ 159576 | consumed samples: 78768 | elapsed time per iteration (ms): 14568.6 | learning rate: 2.181E-05 | global batch size: 32 | lm loss: 6.692476E+00 | loss scale: 16384.0 | grad norm: 68594.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3939/ 159576 | consumed samples: 78800 | elapsed time per iteration (ms): 14680.0 | learning rate: 2.182E-05 | global batch size: 32 | lm loss: 6.509182E+00 | loss scale: 16384.0 | grad norm: 77431.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3940/ 159576 | consumed samples: 78832 | elapsed time per iteration (ms): 14561.3 | learning rate: 2.183E-05 | global batch size: 32 | lm loss: 6.521114E+00 | loss scale: 16384.0 | grad norm: 67107.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3941/ 159576 | consumed samples: 78864 | elapsed time per iteration (ms): 14540.3 | learning rate: 2.184E-05 | global batch size: 32 | lm loss: 6.557777E+00 | loss scale: 16384.0 | grad norm: 82252.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3942/ 159576 | consumed samples: 78896 | elapsed time per iteration (ms): 14516.4 | learning rate: 2.185E-05 | global batch size: 32 | lm loss: 6.519272E+00 | loss scale: 16384.0 | grad norm: 62956.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3943/ 159576 | consumed samples: 78928 | elapsed time per iteration (ms): 14804.0 | learning rate: 2.186E-05 | global batch size: 32 | lm loss: 6.436077E+00 | loss scale: 16384.0 | grad norm: 63372.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3944/ 159576 | consumed samples: 78960 | elapsed time per iteration (ms): 14504.5 | learning rate: 2.187E-05 | global batch size: 32 | lm loss: 6.536609E+00 | loss scale: 16384.0 | grad norm: 70623.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3945/ 159576 | consumed samples: 78992 | elapsed time per iteration (ms): 14519.8 | learning rate: 2.187E-05 | global batch size: 32 | lm loss: 6.631818E+00 | loss scale: 16384.0 | grad norm: 62267.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3946/ 159576 | consumed samples: 79024 | elapsed time per iteration (ms): 14592.1 | learning rate: 2.188E-05 | global batch size: 32 | lm loss: 6.263665E+00 | loss scale: 16384.0 | grad norm: 67107.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3947/ 159576 | consumed samples: 79056 | elapsed time per iteration (ms): 14791.6 | learning rate: 2.189E-05 | global batch size: 32 | lm loss: 6.622372E+00 | loss scale: 16384.0 | grad norm: 84764.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3948/ 159576 | consumed samples: 79088 | elapsed time per iteration (ms): 14637.3 | learning rate: 2.190E-05 | global batch size: 32 | lm loss: 6.395759E+00 | loss scale: 16384.0 | grad norm: 60113.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3949/ 159576 | consumed samples: 79120 | elapsed time per iteration (ms): 14546.6 | learning rate: 2.191E-05 | global batch size: 32 | lm loss: 6.588756E+00 | loss scale: 16384.0 | grad norm: 68679.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3950/ 159576 | consumed samples: 79152 | elapsed time per iteration (ms): 14514.6 | learning rate: 2.192E-05 | global batch size: 32 | lm loss: 6.484011E+00 | loss scale: 16384.0 | grad norm: 68729.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3951/ 159576 | consumed samples: 79184 | elapsed time per iteration (ms): 14907.8 | learning rate: 2.193E-05 | global batch size: 32 | lm loss: 6.496289E+00 | loss scale: 16384.0 | grad norm: 58918.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3952/ 159576 | consumed samples: 79216 | elapsed time per iteration (ms): 14467.7 | learning rate: 2.194E-05 | global batch size: 32 | lm loss: 6.442475E+00 | loss scale: 16384.0 | grad norm: 73240.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3953/ 159576 | consumed samples: 79248 | elapsed time per iteration (ms): 14613.3 | learning rate: 2.195E-05 | global batch size: 32 | lm loss: 6.412640E+00 | loss scale: 16384.0 | grad norm: 63495.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3954/ 159576 | consumed samples: 79280 | elapsed time per iteration (ms): 14497.1 | learning rate: 2.195E-05 | global batch size: 32 | lm loss: 6.419092E+00 | loss scale: 16384.0 | grad norm: 64832.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3955/ 159576 | consumed samples: 79312 | elapsed time per iteration (ms): 14864.8 | learning rate: 2.196E-05 | global batch size: 32 | lm loss: 6.411493E+00 | loss scale: 16384.0 | grad norm: 70227.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3956/ 159576 | consumed samples: 79344 | elapsed time per iteration (ms): 14501.1 | learning rate: 2.197E-05 | global batch size: 32 | lm loss: 6.377773E+00 | loss scale: 16384.0 | grad norm: 65521.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3957/ 159576 | consumed samples: 79376 | elapsed time per iteration (ms): 14522.7 | learning rate: 2.198E-05 | global batch size: 32 | lm loss: 6.458980E+00 | loss scale: 16384.0 | grad norm: 62294.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3958/ 159576 | consumed samples: 79408 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.199E-05 | global batch size: 32 | lm loss: 6.540348E+00 | loss scale: 16384.0 | grad norm: 64994.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3959/ 159576 | consumed samples: 79440 | elapsed time per iteration (ms): 14868.7 | learning rate: 2.200E-05 | global batch size: 32 | lm loss: 6.503858E+00 | loss scale: 16384.0 | grad norm: 54271.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3960/ 159576 | consumed samples: 79472 | elapsed time per iteration (ms): 14512.5 | learning rate: 2.201E-05 | global batch size: 32 | lm loss: 6.372645E+00 | loss scale: 16384.0 | grad norm: 73237.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3961/ 159576 | consumed samples: 79504 | elapsed time per iteration (ms): 14552.3 | learning rate: 2.202E-05 | global batch size: 32 | lm loss: 6.396554E+00 | loss scale: 16384.0 | grad norm: 64579.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3962/ 159576 | consumed samples: 79536 | elapsed time per iteration (ms): 14559.3 | learning rate: 2.203E-05 | global batch size: 32 | lm loss: 6.556979E+00 | loss scale: 16384.0 | grad norm: 83489.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3963/ 159576 | consumed samples: 79568 | elapsed time per iteration (ms): 14899.9 | learning rate: 2.203E-05 | global batch size: 32 | lm loss: 6.458327E+00 | loss scale: 16384.0 | grad norm: 58716.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3964/ 159576 | consumed samples: 79600 | elapsed time per iteration (ms): 14539.5 | learning rate: 2.204E-05 | global batch size: 32 | lm loss: 6.802517E+00 | loss scale: 16384.0 | grad norm: 60731.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3965/ 159576 | consumed samples: 79632 | elapsed time per iteration (ms): 14520.1 | learning rate: 2.205E-05 | global batch size: 32 | lm loss: 6.616902E+00 | loss scale: 16384.0 | grad norm: 64155.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3966/ 159576 | consumed samples: 79664 | elapsed time per iteration (ms): 14585.2 | learning rate: 2.206E-05 | global batch size: 32 | lm loss: 6.457995E+00 | loss scale: 16384.0 | grad norm: 74880.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3967/ 159576 | consumed samples: 79696 | elapsed time per iteration (ms): 14850.0 | learning rate: 2.207E-05 | global batch size: 32 | lm loss: 6.591904E+00 | loss scale: 16384.0 | grad norm: 75336.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3968/ 159576 | consumed samples: 79728 | elapsed time per iteration (ms): 14661.7 | learning rate: 2.208E-05 | global batch size: 32 | lm loss: 6.475752E+00 | loss scale: 16384.0 | grad norm: 76852.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3969/ 159576 | consumed samples: 79760 | elapsed time per iteration (ms): 14523.7 | learning rate: 2.209E-05 | global batch size: 32 | lm loss: 6.452621E+00 | loss scale: 16384.0 | grad norm: 65844.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3970/ 159576 | consumed samples: 79792 | elapsed time per iteration (ms): 14549.1 | learning rate: 2.210E-05 | global batch size: 32 | lm loss: 6.401618E+00 | loss scale: 16384.0 | grad norm: 84954.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3971/ 159576 | consumed samples: 79824 | elapsed time per iteration (ms): 14508.8 | learning rate: 2.211E-05 | global batch size: 32 | lm loss: 6.516178E+00 | loss scale: 16384.0 | grad norm: 71111.037 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3972/ 159576 | consumed samples: 79856 | elapsed time per iteration (ms): 14847.5 | learning rate: 2.211E-05 | global batch size: 32 | lm loss: 6.601567E+00 | loss scale: 16384.0 | grad norm: 74563.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3973/ 159576 | consumed samples: 79888 | elapsed time per iteration (ms): 14594.0 | learning rate: 2.212E-05 | global batch size: 32 | lm loss: 6.441951E+00 | loss scale: 16384.0 | grad norm: 72653.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3974/ 159576 | consumed samples: 79920 | elapsed time per iteration (ms): 14478.4 | learning rate: 2.213E-05 | global batch size: 32 | lm loss: 6.510294E+00 | loss scale: 16384.0 | grad norm: 65083.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3975/ 159576 | consumed samples: 79952 | elapsed time per iteration (ms): 14520.1 | learning rate: 2.214E-05 | global batch size: 32 | lm loss: 6.345959E+00 | loss scale: 16384.0 | grad norm: 133600.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3976/ 159576 | consumed samples: 79984 | elapsed time per iteration (ms): 14770.3 | learning rate: 2.215E-05 | global batch size: 32 | lm loss: 6.477483E+00 | loss scale: 16384.0 | grad norm: 89443.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3977/ 159576 | consumed samples: 80016 | elapsed time per iteration (ms): 14483.7 | learning rate: 2.216E-05 | global batch size: 32 | lm loss: 6.466526E+00 | loss scale: 16384.0 | grad norm: 79203.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3978/ 159576 | consumed samples: 80048 | elapsed time per iteration (ms): 14548.9 | learning rate: 2.217E-05 | global batch size: 32 | lm loss: 6.490917E+00 | loss scale: 16384.0 | grad norm: 85035.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3979/ 159576 | consumed samples: 80080 | elapsed time per iteration (ms): 14519.8 | learning rate: 2.218E-05 | global batch size: 32 | lm loss: 6.412145E+00 | loss scale: 16384.0 | grad norm: 93580.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3980/ 159576 | consumed samples: 80112 | elapsed time per iteration (ms): 14659.7 | learning rate: 2.218E-05 | global batch size: 32 | lm loss: 6.473646E+00 | loss scale: 16384.0 | grad norm: 79422.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3981/ 159576 | consumed samples: 80144 | elapsed time per iteration (ms): 14525.1 | learning rate: 2.219E-05 | global batch size: 32 | lm loss: 6.522334E+00 | loss scale: 16384.0 | grad norm: 83533.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3982/ 159576 | consumed samples: 80176 | elapsed time per iteration (ms): 14543.1 | learning rate: 2.220E-05 | global batch size: 32 | lm loss: 6.387228E+00 | loss scale: 16384.0 | grad norm: 89795.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3983/ 159576 | consumed samples: 80208 | elapsed time per iteration (ms): 14609.8 | learning rate: 2.221E-05 | global batch size: 32 | lm loss: 6.475267E+00 | loss scale: 16384.0 | grad norm: 119598.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3984/ 159576 | consumed samples: 80240 | elapsed time per iteration (ms): 14596.2 | learning rate: 2.222E-05 | global batch size: 32 | lm loss: 6.533351E+00 | loss scale: 16384.0 | grad norm: 72306.036 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3985/ 159576 | consumed samples: 80272 | elapsed time per iteration (ms): 14621.5 | learning rate: 2.223E-05 | global batch size: 32 | lm loss: 6.540237E+00 | loss scale: 16384.0 | grad norm: 88358.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3986/ 159576 | consumed samples: 80304 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.224E-05 | global batch size: 32 | lm loss: 6.419699E+00 | loss scale: 16384.0 | grad norm: 75411.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3987/ 159576 | consumed samples: 80336 | elapsed time per iteration (ms): 14555.9 | learning rate: 2.225E-05 | global batch size: 32 | lm loss: 6.591748E+00 | loss scale: 16384.0 | grad norm: 112139.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3988/ 159576 | consumed samples: 80368 | elapsed time per iteration (ms): 15004.4 | learning rate: 2.226E-05 | global batch size: 32 | lm loss: 6.551664E+00 | loss scale: 16384.0 | grad norm: 88397.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3989/ 159576 | consumed samples: 80400 | elapsed time per iteration (ms): 14610.9 | learning rate: 2.226E-05 | global batch size: 32 | lm loss: 6.531049E+00 | loss scale: 16384.0 | grad norm: 63924.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3990/ 159576 | consumed samples: 80432 | elapsed time per iteration (ms): 14532.5 | learning rate: 2.227E-05 | global batch size: 32 | lm loss: 6.546918E+00 | loss scale: 16384.0 | grad norm: 97299.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3991/ 159576 | consumed samples: 80464 | elapsed time per iteration (ms): 14437.4 | learning rate: 2.228E-05 | global batch size: 32 | lm loss: 6.471569E+00 | loss scale: 16384.0 | grad norm: 76326.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3992/ 159576 | consumed samples: 80496 | elapsed time per iteration (ms): 14906.8 | learning rate: 2.229E-05 | global batch size: 32 | lm loss: 6.525407E+00 | loss scale: 16384.0 | grad norm: 77183.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3993/ 159576 | consumed samples: 80528 | elapsed time per iteration (ms): 14534.2 | learning rate: 2.230E-05 | global batch size: 32 | lm loss: 6.539597E+00 | loss scale: 16384.0 | grad norm: 60376.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3994/ 159576 | consumed samples: 80560 | elapsed time per iteration (ms): 14579.3 | learning rate: 2.231E-05 | global batch size: 32 | lm loss: 6.552666E+00 | loss scale: 16384.0 | grad norm: 84746.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3995/ 159576 | consumed samples: 80592 | elapsed time per iteration (ms): 14529.3 | learning rate: 2.232E-05 | global batch size: 32 | lm loss: 6.413946E+00 | loss scale: 16384.0 | grad norm: 67969.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3996/ 159576 | consumed samples: 80624 | elapsed time per iteration (ms): 14922.8 | learning rate: 2.233E-05 | global batch size: 32 | lm loss: 6.427969E+00 | loss scale: 16384.0 | grad norm: 69649.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3997/ 159576 | consumed samples: 80656 | elapsed time per iteration (ms): 14490.2 | learning rate: 2.234E-05 | global batch size: 32 | lm loss: 6.527985E+00 | loss scale: 16384.0 | grad norm: 71644.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3998/ 159576 | consumed samples: 80688 | elapsed time per iteration (ms): 14523.9 | learning rate: 2.234E-05 | global batch size: 32 | lm loss: 6.565372E+00 | loss scale: 16384.0 | grad norm: 75823.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3999/ 159576 | consumed samples: 80720 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.235E-05 | global batch size: 32 | lm loss: 6.377391E+00 | loss scale: 16384.0 | grad norm: 85967.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 17:49:17,238] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=6, lr=[2.2362426035502958e-05, 2.2362426035502958e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 4000 loss: 6.3572 iter time (s): 0.007 samples/sec: 4556.985 iteration 4000/ 159576 | consumed samples: 80752 | elapsed time per iteration (ms): 14858.7 | learning rate: 2.236E-05 | global batch size: 32 | lm loss: 6.357164E+00 | loss scale: 16384.0 | grad norm: 56878.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 4000 | lm loss value: 6.453290E+00 | lm loss PPL: 6.347876E+02 | ------------------------------------------------------------------------------------------------ iteration 4001/ 159576 | consumed samples: 80784 | elapsed time per iteration (ms): 20796.3 | learning rate: 2.237E-05 | global batch size: 32 | lm loss: 6.357805E+00 | loss scale: 16384.0 | grad norm: 75271.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4002/ 159576 | consumed samples: 80816 | elapsed time per iteration (ms): 14528.3 | learning rate: 2.238E-05 | global batch size: 32 | lm loss: 6.590372E+00 | loss scale: 16384.0 | grad norm: 82823.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4003/ 159576 | consumed samples: 80848 | elapsed time per iteration (ms): 14569.0 | learning rate: 2.239E-05 | global batch size: 32 | lm loss: 6.547601E+00 | loss scale: 16384.0 | grad norm: 63495.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4004/ 159576 | consumed samples: 80880 | elapsed time per iteration (ms): 14981.7 | learning rate: 2.240E-05 | global batch size: 32 | lm loss: 6.488581E+00 | loss scale: 16384.0 | grad norm: 84538.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4005/ 159576 | consumed samples: 80912 | elapsed time per iteration (ms): 14517.6 | learning rate: 2.241E-05 | global batch size: 32 | lm loss: 6.473035E+00 | loss scale: 16384.0 | grad norm: 69154.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4006/ 159576 | consumed samples: 80944 | elapsed time per iteration (ms): 14515.3 | learning rate: 2.242E-05 | global batch size: 32 | lm loss: 6.574604E+00 | loss scale: 16384.0 | grad norm: 71258.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4007/ 159576 | consumed samples: 80976 | elapsed time per iteration (ms): 14530.3 | learning rate: 2.242E-05 | global batch size: 32 | lm loss: 6.480978E+00 | loss scale: 16384.0 | grad norm: 63598.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4008/ 159576 | consumed samples: 81008 | elapsed time per iteration (ms): 15052.4 | learning rate: 2.243E-05 | global batch size: 32 | lm loss: 6.393389E+00 | loss scale: 16384.0 | grad norm: 76474.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4009/ 159576 | consumed samples: 81040 | elapsed time per iteration (ms): 14618.9 | learning rate: 2.244E-05 | global batch size: 32 | lm loss: 6.322450E+00 | loss scale: 16384.0 | grad norm: 62736.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4010/ 159576 | consumed samples: 81072 | elapsed time per iteration (ms): 14521.7 | learning rate: 2.245E-05 | global batch size: 32 | lm loss: 6.502364E+00 | loss scale: 16384.0 | grad norm: 78751.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4011/ 159576 | consumed samples: 81104 | elapsed time per iteration (ms): 14513.4 | learning rate: 2.246E-05 | global batch size: 32 | lm loss: 6.504915E+00 | loss scale: 16384.0 | grad norm: 73290.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4012/ 159576 | consumed samples: 81136 | elapsed time per iteration (ms): 14859.5 | learning rate: 2.247E-05 | global batch size: 32 | lm loss: 6.422670E+00 | loss scale: 16384.0 | grad norm: 70911.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4013/ 159576 | consumed samples: 81168 | elapsed time per iteration (ms): 14562.7 | learning rate: 2.248E-05 | global batch size: 32 | lm loss: 6.460926E+00 | loss scale: 16384.0 | grad norm: 88361.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4014/ 159576 | consumed samples: 81200 | elapsed time per iteration (ms): 14537.6 | learning rate: 2.249E-05 | global batch size: 32 | lm loss: 6.359708E+00 | loss scale: 16384.0 | grad norm: 70950.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4015/ 159576 | consumed samples: 81232 | elapsed time per iteration (ms): 14575.5 | learning rate: 2.250E-05 | global batch size: 32 | lm loss: 6.479752E+00 | loss scale: 16384.0 | grad norm: 60916.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4016/ 159576 | consumed samples: 81264 | elapsed time per iteration (ms): 14890.4 | learning rate: 2.250E-05 | global batch size: 32 | lm loss: 6.438080E+00 | loss scale: 16384.0 | grad norm: 78503.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4017/ 159576 | consumed samples: 81296 | elapsed time per iteration (ms): 14519.4 | learning rate: 2.251E-05 | global batch size: 32 | lm loss: 6.446492E+00 | loss scale: 16384.0 | grad norm: 66299.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4018/ 159576 | consumed samples: 81328 | elapsed time per iteration (ms): 14512.9 | learning rate: 2.252E-05 | global batch size: 32 | lm loss: 6.418320E+00 | loss scale: 16384.0 | grad norm: 65936.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4019/ 159576 | consumed samples: 81360 | elapsed time per iteration (ms): 14568.1 | learning rate: 2.253E-05 | global batch size: 32 | lm loss: 6.337445E+00 | loss scale: 16384.0 | grad norm: 71727.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4020/ 159576 | consumed samples: 81392 | elapsed time per iteration (ms): 14867.3 | learning rate: 2.254E-05 | global batch size: 32 | lm loss: 6.564549E+00 | loss scale: 16384.0 | grad norm: 96122.107 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4021/ 159576 | consumed samples: 81424 | elapsed time per iteration (ms): 14435.4 | learning rate: 2.255E-05 | global batch size: 32 | lm loss: 6.485852E+00 | loss scale: 16384.0 | grad norm: 82597.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4022/ 159576 | consumed samples: 81456 | elapsed time per iteration (ms): 14558.0 | learning rate: 2.256E-05 | global batch size: 32 | lm loss: 6.539099E+00 | loss scale: 16384.0 | grad norm: 121006.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4023/ 159576 | consumed samples: 81488 | elapsed time per iteration (ms): 14530.8 | learning rate: 2.257E-05 | global batch size: 32 | lm loss: 6.588836E+00 | loss scale: 16384.0 | grad norm: 83990.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4024/ 159576 | consumed samples: 81520 | elapsed time per iteration (ms): 14903.1 | learning rate: 2.258E-05 | global batch size: 32 | lm loss: 6.478038E+00 | loss scale: 16384.0 | grad norm: 86310.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4025/ 159576 | consumed samples: 81552 | elapsed time per iteration (ms): 14640.8 | learning rate: 2.258E-05 | global batch size: 32 | lm loss: 6.423618E+00 | loss scale: 16384.0 | grad norm: 72646.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4026/ 159576 | consumed samples: 81584 | elapsed time per iteration (ms): 14523.1 | learning rate: 2.259E-05 | global batch size: 32 | lm loss: 6.389876E+00 | loss scale: 16384.0 | grad norm: 75260.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4027/ 159576 | consumed samples: 81616 | elapsed time per iteration (ms): 14495.3 | learning rate: 2.260E-05 | global batch size: 32 | lm loss: 6.686980E+00 | loss scale: 16384.0 | grad norm: 68901.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4028/ 159576 | consumed samples: 81648 | elapsed time per iteration (ms): 14518.7 | learning rate: 2.261E-05 | global batch size: 32 | lm loss: 6.454273E+00 | loss scale: 16384.0 | grad norm: 78058.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4029/ 159576 | consumed samples: 81680 | elapsed time per iteration (ms): 14751.7 | learning rate: 2.262E-05 | global batch size: 32 | lm loss: 6.645922E+00 | loss scale: 16384.0 | grad norm: 90877.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4030/ 159576 | consumed samples: 81712 | elapsed time per iteration (ms): 14605.8 | learning rate: 2.263E-05 | global batch size: 32 | lm loss: 6.554152E+00 | loss scale: 16384.0 | grad norm: 71333.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4031/ 159576 | consumed samples: 81744 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.264E-05 | global batch size: 32 | lm loss: 6.512757E+00 | loss scale: 16384.0 | grad norm: 75409.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4032/ 159576 | consumed samples: 81776 | elapsed time per iteration (ms): 14627.7 | learning rate: 2.265E-05 | global batch size: 32 | lm loss: 6.529600E+00 | loss scale: 16384.0 | grad norm: 83852.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4033/ 159576 | consumed samples: 81808 | elapsed time per iteration (ms): 14706.7 | learning rate: 2.266E-05 | global batch size: 32 | lm loss: 6.312231E+00 | loss scale: 16384.0 | grad norm: 64610.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4034/ 159576 | consumed samples: 81840 | elapsed time per iteration (ms): 14453.1 | learning rate: 2.266E-05 | global batch size: 32 | lm loss: 6.378237E+00 | loss scale: 16384.0 | grad norm: 70363.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4035/ 159576 | consumed samples: 81872 | elapsed time per iteration (ms): 14558.4 | learning rate: 2.267E-05 | global batch size: 32 | lm loss: 6.617406E+00 | loss scale: 16384.0 | grad norm: 76776.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4036/ 159576 | consumed samples: 81904 | elapsed time per iteration (ms): 14451.4 | learning rate: 2.268E-05 | global batch size: 32 | lm loss: 6.510260E+00 | loss scale: 16384.0 | grad norm: 65763.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4037/ 159576 | consumed samples: 81936 | elapsed time per iteration (ms): 14734.4 | learning rate: 2.269E-05 | global batch size: 32 | lm loss: 6.484540E+00 | loss scale: 16384.0 | grad norm: 113964.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4038/ 159576 | consumed samples: 81968 | elapsed time per iteration (ms): 14560.9 | learning rate: 2.270E-05 | global batch size: 32 | lm loss: 6.422564E+00 | loss scale: 16384.0 | grad norm: 71196.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4039/ 159576 | consumed samples: 82000 | elapsed time per iteration (ms): 14521.4 | learning rate: 2.271E-05 | global batch size: 32 | lm loss: 6.468810E+00 | loss scale: 16384.0 | grad norm: 81464.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4040/ 159576 | consumed samples: 82032 | elapsed time per iteration (ms): 14534.9 | learning rate: 2.272E-05 | global batch size: 32 | lm loss: 6.528829E+00 | loss scale: 16384.0 | grad norm: 64883.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4041/ 159576 | consumed samples: 82064 | elapsed time per iteration (ms): 14840.7 | learning rate: 2.273E-05 | global batch size: 32 | lm loss: 6.466451E+00 | loss scale: 16384.0 | grad norm: 113319.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4042/ 159576 | consumed samples: 82096 | elapsed time per iteration (ms): 14627.3 | learning rate: 2.274E-05 | global batch size: 32 | lm loss: 6.455089E+00 | loss scale: 16384.0 | grad norm: 63704.855 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4043/ 159576 | consumed samples: 82128 | elapsed time per iteration (ms): 14401.0 | learning rate: 2.274E-05 | global batch size: 32 | lm loss: 6.394213E+00 | loss scale: 16384.0 | grad norm: 104510.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4044/ 159576 | consumed samples: 82160 | elapsed time per iteration (ms): 14522.2 | learning rate: 2.275E-05 | global batch size: 32 | lm loss: 6.436733E+00 | loss scale: 16384.0 | grad norm: 69916.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4045/ 159576 | consumed samples: 82192 | elapsed time per iteration (ms): 14878.3 | learning rate: 2.276E-05 | global batch size: 32 | lm loss: 6.467334E+00 | loss scale: 16384.0 | grad norm: 86814.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4046/ 159576 | consumed samples: 82224 | elapsed time per iteration (ms): 14619.5 | learning rate: 2.277E-05 | global batch size: 32 | lm loss: 6.542828E+00 | loss scale: 16384.0 | grad norm: 91169.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4047/ 159576 | consumed samples: 82256 | elapsed time per iteration (ms): 14546.0 | learning rate: 2.278E-05 | global batch size: 32 | lm loss: 6.482902E+00 | loss scale: 16384.0 | grad norm: 71855.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4048/ 159576 | consumed samples: 82288 | elapsed time per iteration (ms): 14535.3 | learning rate: 2.279E-05 | global batch size: 32 | lm loss: 6.380974E+00 | loss scale: 16384.0 | grad norm: 110448.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4049/ 159576 | consumed samples: 82320 | elapsed time per iteration (ms): 14946.7 | learning rate: 2.280E-05 | global batch size: 32 | lm loss: 6.604033E+00 | loss scale: 16384.0 | grad norm: 86973.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4050/ 159576 | consumed samples: 82352 | elapsed time per iteration (ms): 14452.3 | learning rate: 2.281E-05 | global batch size: 32 | lm loss: 6.485418E+00 | loss scale: 16384.0 | grad norm: 93547.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4051/ 159576 | consumed samples: 82384 | elapsed time per iteration (ms): 14486.7 | learning rate: 2.282E-05 | global batch size: 32 | lm loss: 6.447795E+00 | loss scale: 16384.0 | grad norm: 71623.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4052/ 159576 | consumed samples: 82416 | elapsed time per iteration (ms): 14546.0 | learning rate: 2.282E-05 | global batch size: 32 | lm loss: 6.490433E+00 | loss scale: 16384.0 | grad norm: 122748.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4053/ 159576 | consumed samples: 82448 | elapsed time per iteration (ms): 14923.8 | learning rate: 2.283E-05 | global batch size: 32 | lm loss: 6.393107E+00 | loss scale: 16384.0 | grad norm: 94716.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4054/ 159576 | consumed samples: 82480 | elapsed time per iteration (ms): 14522.3 | learning rate: 2.284E-05 | global batch size: 32 | lm loss: 6.560749E+00 | loss scale: 16384.0 | grad norm: 87911.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4055/ 159576 | consumed samples: 82512 | elapsed time per iteration (ms): 14576.1 | learning rate: 2.285E-05 | global batch size: 32 | lm loss: 6.508199E+00 | loss scale: 16384.0 | grad norm: 75712.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4056/ 159576 | consumed samples: 82544 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.286E-05 | global batch size: 32 | lm loss: 6.480619E+00 | loss scale: 16384.0 | grad norm: 92968.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4057/ 159576 | consumed samples: 82576 | elapsed time per iteration (ms): 14814.4 | learning rate: 2.287E-05 | global batch size: 32 | lm loss: 6.324226E+00 | loss scale: 16384.0 | grad norm: 78472.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4058/ 159576 | consumed samples: 82608 | elapsed time per iteration (ms): 14459.3 | learning rate: 2.288E-05 | global batch size: 32 | lm loss: 6.626959E+00 | loss scale: 16384.0 | grad norm: 80531.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4059/ 159576 | consumed samples: 82640 | elapsed time per iteration (ms): 14496.4 | learning rate: 2.289E-05 | global batch size: 32 | lm loss: 6.406682E+00 | loss scale: 16384.0 | grad norm: 75308.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4060/ 159576 | consumed samples: 82672 | elapsed time per iteration (ms): 14562.2 | learning rate: 2.289E-05 | global batch size: 32 | lm loss: 6.440542E+00 | loss scale: 16384.0 | grad norm: 78114.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4061/ 159576 | consumed samples: 82704 | elapsed time per iteration (ms): 14796.0 | learning rate: 2.290E-05 | global batch size: 32 | lm loss: 6.468933E+00 | loss scale: 16384.0 | grad norm: 77154.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4062/ 159576 | consumed samples: 82736 | elapsed time per iteration (ms): 14696.5 | learning rate: 2.291E-05 | global batch size: 32 | lm loss: 6.318196E+00 | loss scale: 16384.0 | grad norm: 97551.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4063/ 159576 | consumed samples: 82768 | elapsed time per iteration (ms): 14468.1 | learning rate: 2.292E-05 | global batch size: 32 | lm loss: 6.472930E+00 | loss scale: 16384.0 | grad norm: 110041.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4064/ 159576 | consumed samples: 82800 | elapsed time per iteration (ms): 14496.2 | learning rate: 2.293E-05 | global batch size: 32 | lm loss: 6.523721E+00 | loss scale: 16384.0 | grad norm: 88018.768 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4065/ 159576 | consumed samples: 82832 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.294E-05 | global batch size: 32 | lm loss: 6.453180E+00 | loss scale: 16384.0 | grad norm: 83087.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4066/ 159576 | consumed samples: 82864 | elapsed time per iteration (ms): 14884.4 | learning rate: 2.295E-05 | global batch size: 32 | lm loss: 6.447326E+00 | loss scale: 16384.0 | grad norm: 72433.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4067/ 159576 | consumed samples: 82896 | elapsed time per iteration (ms): 14491.5 | learning rate: 2.296E-05 | global batch size: 32 | lm loss: 6.366633E+00 | loss scale: 16384.0 | grad norm: 100504.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4068/ 159576 | consumed samples: 82928 | elapsed time per iteration (ms): 14561.6 | learning rate: 2.297E-05 | global batch size: 32 | lm loss: 6.315294E+00 | loss scale: 16384.0 | grad norm: 79868.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4069/ 159576 | consumed samples: 82960 | elapsed time per iteration (ms): 14538.6 | learning rate: 2.297E-05 | global batch size: 32 | lm loss: 6.452709E+00 | loss scale: 16384.0 | grad norm: 94073.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4070/ 159576 | consumed samples: 82992 | elapsed time per iteration (ms): 14651.1 | learning rate: 2.298E-05 | global batch size: 32 | lm loss: 6.421084E+00 | loss scale: 16384.0 | grad norm: 96558.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4071/ 159576 | consumed samples: 83024 | elapsed time per iteration (ms): 14508.0 | learning rate: 2.299E-05 | global batch size: 32 | lm loss: 6.474918E+00 | loss scale: 16384.0 | grad norm: 104437.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4072/ 159576 | consumed samples: 83056 | elapsed time per iteration (ms): 14540.3 | learning rate: 2.300E-05 | global batch size: 32 | lm loss: 6.442264E+00 | loss scale: 16384.0 | grad norm: 69985.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 18:07:07] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 18:07:07] PULSE: tr8-104B is running for 12:14:56 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 4073/ 159576 | consumed samples: 83088 | elapsed time per iteration (ms): 14430.9 | learning rate: 2.301E-05 | global batch size: 32 | lm loss: 6.464416E+00 | loss scale: 16384.0 | grad norm: 92935.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4074/ 159576 | consumed samples: 83120 | elapsed time per iteration (ms): 14595.5 | learning rate: 2.302E-05 | global batch size: 32 | lm loss: 6.394172E+00 | loss scale: 16384.0 | grad norm: 93727.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4075/ 159576 | consumed samples: 83152 | elapsed time per iteration (ms): 14478.6 | learning rate: 2.303E-05 | global batch size: 32 | lm loss: 6.535138E+00 | loss scale: 16384.0 | grad norm: 110910.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4076/ 159576 | consumed samples: 83184 | elapsed time per iteration (ms): 14559.7 | learning rate: 2.304E-05 | global batch size: 32 | lm loss: 6.459756E+00 | loss scale: 16384.0 | grad norm: 79798.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4077/ 159576 | consumed samples: 83216 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.305E-05 | global batch size: 32 | lm loss: 6.388766E+00 | loss scale: 16384.0 | grad norm: 80153.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4078/ 159576 | consumed samples: 83248 | elapsed time per iteration (ms): 15028.3 | learning rate: 2.305E-05 | global batch size: 32 | lm loss: 6.462305E+00 | loss scale: 16384.0 | grad norm: 72541.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4079/ 159576 | consumed samples: 83280 | elapsed time per iteration (ms): 14501.7 | learning rate: 2.306E-05 | global batch size: 32 | lm loss: 6.606649E+00 | loss scale: 16384.0 | grad norm: 72682.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4080/ 159576 | consumed samples: 83312 | elapsed time per iteration (ms): 14478.7 | learning rate: 2.307E-05 | global batch size: 32 | lm loss: 6.339183E+00 | loss scale: 16384.0 | grad norm: 77952.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4081/ 159576 | consumed samples: 83344 | elapsed time per iteration (ms): 14534.3 | learning rate: 2.308E-05 | global batch size: 32 | lm loss: 6.482682E+00 | loss scale: 16384.0 | grad norm: 78541.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4082/ 159576 | consumed samples: 83376 | elapsed time per iteration (ms): 14971.6 | learning rate: 2.309E-05 | global batch size: 32 | lm loss: 6.464870E+00 | loss scale: 16384.0 | grad norm: 82812.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4083/ 159576 | consumed samples: 83408 | elapsed time per iteration (ms): 14619.1 | learning rate: 2.310E-05 | global batch size: 32 | lm loss: 6.468065E+00 | loss scale: 16384.0 | grad norm: 95549.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4084/ 159576 | consumed samples: 83440 | elapsed time per iteration (ms): 14580.8 | learning rate: 2.311E-05 | global batch size: 32 | lm loss: 6.390970E+00 | loss scale: 16384.0 | grad norm: 76775.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4085/ 159576 | consumed samples: 83472 | elapsed time per iteration (ms): 14597.4 | learning rate: 2.312E-05 | global batch size: 32 | lm loss: 6.441597E+00 | loss scale: 16384.0 | grad norm: 87885.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4086/ 159576 | consumed samples: 83504 | elapsed time per iteration (ms): 14827.9 | learning rate: 2.313E-05 | global batch size: 32 | lm loss: 6.332308E+00 | loss scale: 16384.0 | grad norm: 67530.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4087/ 159576 | consumed samples: 83536 | elapsed time per iteration (ms): 14496.3 | learning rate: 2.313E-05 | global batch size: 32 | lm loss: 6.360069E+00 | loss scale: 16384.0 | grad norm: 65277.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4088/ 159576 | consumed samples: 83568 | elapsed time per iteration (ms): 14505.1 | learning rate: 2.314E-05 | global batch size: 32 | lm loss: 6.331870E+00 | loss scale: 16384.0 | grad norm: 73276.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4089/ 159576 | consumed samples: 83600 | elapsed time per iteration (ms): 14518.3 | learning rate: 2.315E-05 | global batch size: 32 | lm loss: 6.279953E+00 | loss scale: 16384.0 | grad norm: 69193.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4090/ 159576 | consumed samples: 83632 | elapsed time per iteration (ms): 14816.9 | learning rate: 2.316E-05 | global batch size: 32 | lm loss: 6.473932E+00 | loss scale: 16384.0 | grad norm: 78838.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4091/ 159576 | consumed samples: 83664 | elapsed time per iteration (ms): 14589.1 | learning rate: 2.317E-05 | global batch size: 32 | lm loss: 6.346605E+00 | loss scale: 16384.0 | grad norm: 76401.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4092/ 159576 | consumed samples: 83696 | elapsed time per iteration (ms): 14611.5 | learning rate: 2.318E-05 | global batch size: 32 | lm loss: 6.444325E+00 | loss scale: 16384.0 | grad norm: 85411.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4093/ 159576 | consumed samples: 83728 | elapsed time per iteration (ms): 14540.2 | learning rate: 2.319E-05 | global batch size: 32 | lm loss: 6.498468E+00 | loss scale: 16384.0 | grad norm: 97013.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4094/ 159576 | consumed samples: 83760 | elapsed time per iteration (ms): 14934.5 | learning rate: 2.320E-05 | global batch size: 32 | lm loss: 6.368524E+00 | loss scale: 16384.0 | grad norm: 75310.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4095/ 159576 | consumed samples: 83792 | elapsed time per iteration (ms): 14479.4 | learning rate: 2.321E-05 | global batch size: 32 | lm loss: 6.445729E+00 | loss scale: 16384.0 | grad norm: 79666.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4096/ 159576 | consumed samples: 83824 | elapsed time per iteration (ms): 14539.3 | learning rate: 2.321E-05 | global batch size: 32 | lm loss: 6.478226E+00 | loss scale: 16384.0 | grad norm: 74953.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4097/ 159576 | consumed samples: 83856 | elapsed time per iteration (ms): 14544.9 | learning rate: 2.322E-05 | global batch size: 32 | lm loss: 6.494800E+00 | loss scale: 16384.0 | grad norm: 83444.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4098/ 159576 | consumed samples: 83888 | elapsed time per iteration (ms): 14987.3 | learning rate: 2.323E-05 | global batch size: 32 | lm loss: 6.549989E+00 | loss scale: 16384.0 | grad norm: 73065.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4099/ 159576 | consumed samples: 83920 | elapsed time per iteration (ms): 14510.7 | learning rate: 2.324E-05 | global batch size: 32 | lm loss: 6.523539E+00 | loss scale: 16384.0 | grad norm: 83625.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4100/ 159576 | consumed samples: 83952 | elapsed time per iteration (ms): 14610.5 | learning rate: 2.325E-05 | global batch size: 32 | lm loss: 6.451036E+00 | loss scale: 16384.0 | grad norm: 74563.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4101/ 159576 | consumed samples: 83984 | elapsed time per iteration (ms): 14604.4 | learning rate: 2.326E-05 | global batch size: 32 | lm loss: 6.472479E+00 | loss scale: 16384.0 | grad norm: 109783.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4102/ 159576 | consumed samples: 84016 | elapsed time per iteration (ms): 14804.2 | learning rate: 2.327E-05 | global batch size: 32 | lm loss: 6.392324E+00 | loss scale: 16384.0 | grad norm: 77708.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4103/ 159576 | consumed samples: 84048 | elapsed time per iteration (ms): 14666.7 | learning rate: 2.328E-05 | global batch size: 32 | lm loss: 6.388014E+00 | loss scale: 16384.0 | grad norm: 72228.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4104/ 159576 | consumed samples: 84080 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.329E-05 | global batch size: 32 | lm loss: 6.351237E+00 | loss scale: 16384.0 | grad norm: 75762.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4105/ 159576 | consumed samples: 84112 | elapsed time per iteration (ms): 14512.3 | learning rate: 2.329E-05 | global batch size: 32 | lm loss: 6.445687E+00 | loss scale: 16384.0 | grad norm: 71985.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4106/ 159576 | consumed samples: 84144 | elapsed time per iteration (ms): 14555.0 | learning rate: 2.330E-05 | global batch size: 32 | lm loss: 6.450569E+00 | loss scale: 16384.0 | grad norm: 70873.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4107/ 159576 | consumed samples: 84176 | elapsed time per iteration (ms): 14836.4 | learning rate: 2.331E-05 | global batch size: 32 | lm loss: 6.490268E+00 | loss scale: 16384.0 | grad norm: 62324.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4108/ 159576 | consumed samples: 84208 | elapsed time per iteration (ms): 14607.5 | learning rate: 2.332E-05 | global batch size: 32 | lm loss: 6.503112E+00 | loss scale: 16384.0 | grad norm: 80147.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4109/ 159576 | consumed samples: 84240 | elapsed time per iteration (ms): 14516.1 | learning rate: 2.333E-05 | global batch size: 32 | lm loss: 6.575756E+00 | loss scale: 16384.0 | grad norm: 85277.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4110/ 159576 | consumed samples: 84272 | elapsed time per iteration (ms): 14534.3 | learning rate: 2.334E-05 | global batch size: 32 | lm loss: 6.521991E+00 | loss scale: 16384.0 | grad norm: 88147.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4111/ 159576 | consumed samples: 84304 | elapsed time per iteration (ms): 14643.4 | learning rate: 2.335E-05 | global batch size: 32 | lm loss: 6.583647E+00 | loss scale: 16384.0 | grad norm: 90470.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4112/ 159576 | consumed samples: 84336 | elapsed time per iteration (ms): 14501.6 | learning rate: 2.336E-05 | global batch size: 32 | lm loss: 6.307788E+00 | loss scale: 16384.0 | grad norm: 84679.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4113/ 159576 | consumed samples: 84368 | elapsed time per iteration (ms): 14565.5 | learning rate: 2.337E-05 | global batch size: 32 | lm loss: 6.392709E+00 | loss scale: 16384.0 | grad norm: 85222.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4114/ 159576 | consumed samples: 84400 | elapsed time per iteration (ms): 14580.4 | learning rate: 2.337E-05 | global batch size: 32 | lm loss: 6.384982E+00 | loss scale: 16384.0 | grad norm: 101932.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4115/ 159576 | consumed samples: 84432 | elapsed time per iteration (ms): 14793.7 | learning rate: 2.338E-05 | global batch size: 32 | lm loss: 6.402984E+00 | loss scale: 16384.0 | grad norm: 80725.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4116/ 159576 | consumed samples: 84464 | elapsed time per iteration (ms): 14599.8 | learning rate: 2.339E-05 | global batch size: 32 | lm loss: 6.431032E+00 | loss scale: 16384.0 | grad norm: 88365.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4117/ 159576 | consumed samples: 84496 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.340E-05 | global batch size: 32 | lm loss: 6.544386E+00 | loss scale: 16384.0 | grad norm: 94647.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4118/ 159576 | consumed samples: 84528 | elapsed time per iteration (ms): 14520.8 | learning rate: 2.341E-05 | global batch size: 32 | lm loss: 6.494756E+00 | loss scale: 16384.0 | grad norm: 127914.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4119/ 159576 | consumed samples: 84560 | elapsed time per iteration (ms): 14810.4 | learning rate: 2.342E-05 | global batch size: 32 | lm loss: 6.676927E+00 | loss scale: 16384.0 | grad norm: 255152.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4120/ 159576 | consumed samples: 84592 | elapsed time per iteration (ms): 14553.6 | learning rate: 2.343E-05 | global batch size: 32 | lm loss: 6.521421E+00 | loss scale: 16384.0 | grad norm: 88738.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4121/ 159576 | consumed samples: 84624 | elapsed time per iteration (ms): 14615.1 | learning rate: 2.344E-05 | global batch size: 32 | lm loss: 6.422895E+00 | loss scale: 16384.0 | grad norm: 69394.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4122/ 159576 | consumed samples: 84656 | elapsed time per iteration (ms): 14526.7 | learning rate: 2.345E-05 | global batch size: 32 | lm loss: 6.391778E+00 | loss scale: 16384.0 | grad norm: 75006.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4123/ 159576 | consumed samples: 84688 | elapsed time per iteration (ms): 14981.6 | learning rate: 2.345E-05 | global batch size: 32 | lm loss: 6.569616E+00 | loss scale: 16384.0 | grad norm: 89357.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4124/ 159576 | consumed samples: 84720 | elapsed time per iteration (ms): 14751.3 | learning rate: 2.346E-05 | global batch size: 32 | lm loss: 6.522147E+00 | loss scale: 16384.0 | grad norm: 83006.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4125/ 159576 | consumed samples: 84752 | elapsed time per iteration (ms): 14464.7 | learning rate: 2.347E-05 | global batch size: 32 | lm loss: 6.443343E+00 | loss scale: 16384.0 | grad norm: 85692.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4126/ 159576 | consumed samples: 84784 | elapsed time per iteration (ms): 14544.8 | learning rate: 2.348E-05 | global batch size: 32 | lm loss: 6.447396E+00 | loss scale: 16384.0 | grad norm: 75026.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4127/ 159576 | consumed samples: 84816 | elapsed time per iteration (ms): 14837.3 | learning rate: 2.349E-05 | global batch size: 32 | lm loss: 6.407457E+00 | loss scale: 16384.0 | grad norm: 68031.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4128/ 159576 | consumed samples: 84848 | elapsed time per iteration (ms): 14497.8 | learning rate: 2.350E-05 | global batch size: 32 | lm loss: 6.509037E+00 | loss scale: 16384.0 | grad norm: 81823.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4129/ 159576 | consumed samples: 84880 | elapsed time per iteration (ms): 14560.1 | learning rate: 2.351E-05 | global batch size: 32 | lm loss: 6.349816E+00 | loss scale: 16384.0 | grad norm: 72346.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4130/ 159576 | consumed samples: 84912 | elapsed time per iteration (ms): 14548.5 | learning rate: 2.352E-05 | global batch size: 32 | lm loss: 6.479569E+00 | loss scale: 16384.0 | grad norm: 87336.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4131/ 159576 | consumed samples: 84944 | elapsed time per iteration (ms): 14910.1 | learning rate: 2.353E-05 | global batch size: 32 | lm loss: 6.617517E+00 | loss scale: 16384.0 | grad norm: 86374.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4132/ 159576 | consumed samples: 84976 | elapsed time per iteration (ms): 14494.2 | learning rate: 2.353E-05 | global batch size: 32 | lm loss: 6.465295E+00 | loss scale: 16384.0 | grad norm: 84022.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4133/ 159576 | consumed samples: 85008 | elapsed time per iteration (ms): 14507.6 | learning rate: 2.354E-05 | global batch size: 32 | lm loss: 6.496157E+00 | loss scale: 16384.0 | grad norm: 84787.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4134/ 159576 | consumed samples: 85040 | elapsed time per iteration (ms): 14524.7 | learning rate: 2.355E-05 | global batch size: 32 | lm loss: 6.413724E+00 | loss scale: 16384.0 | grad norm: 85852.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4135/ 159576 | consumed samples: 85072 | elapsed time per iteration (ms): 14838.8 | learning rate: 2.356E-05 | global batch size: 32 | lm loss: 6.625166E+00 | loss scale: 16384.0 | grad norm: 94635.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4136/ 159576 | consumed samples: 85104 | elapsed time per iteration (ms): 14542.4 | learning rate: 2.357E-05 | global batch size: 32 | lm loss: 6.407034E+00 | loss scale: 16384.0 | grad norm: 84861.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4137/ 159576 | consumed samples: 85136 | elapsed time per iteration (ms): 14613.1 | learning rate: 2.358E-05 | global batch size: 32 | lm loss: 6.522691E+00 | loss scale: 16384.0 | grad norm: 90819.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4138/ 159576 | consumed samples: 85168 | elapsed time per iteration (ms): 14588.1 | learning rate: 2.359E-05 | global batch size: 32 | lm loss: 6.515704E+00 | loss scale: 16384.0 | grad norm: 84641.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4139/ 159576 | consumed samples: 85200 | elapsed time per iteration (ms): 14775.7 | learning rate: 2.360E-05 | global batch size: 32 | lm loss: 6.462790E+00 | loss scale: 16384.0 | grad norm: 109335.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4140/ 159576 | consumed samples: 85232 | elapsed time per iteration (ms): 14632.9 | learning rate: 2.361E-05 | global batch size: 32 | lm loss: 6.565165E+00 | loss scale: 16384.0 | grad norm: 101408.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4141/ 159576 | consumed samples: 85264 | elapsed time per iteration (ms): 14488.2 | learning rate: 2.361E-05 | global batch size: 32 | lm loss: 6.378877E+00 | loss scale: 16384.0 | grad norm: 85177.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4142/ 159576 | consumed samples: 85296 | elapsed time per iteration (ms): 14538.0 | learning rate: 2.362E-05 | global batch size: 32 | lm loss: 6.464640E+00 | loss scale: 16384.0 | grad norm: 107413.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4143/ 159576 | consumed samples: 85328 | elapsed time per iteration (ms): 14656.2 | learning rate: 2.363E-05 | global batch size: 32 | lm loss: 6.672103E+00 | loss scale: 16384.0 | grad norm: 79187.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4144/ 159576 | consumed samples: 85360 | elapsed time per iteration (ms): 14916.7 | learning rate: 2.364E-05 | global batch size: 32 | lm loss: 6.691429E+00 | loss scale: 16384.0 | grad norm: 105292.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4145/ 159576 | consumed samples: 85392 | elapsed time per iteration (ms): 14496.1 | learning rate: 2.365E-05 | global batch size: 32 | lm loss: 6.428411E+00 | loss scale: 16384.0 | grad norm: 81232.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4146/ 159576 | consumed samples: 85424 | elapsed time per iteration (ms): 14532.5 | learning rate: 2.366E-05 | global batch size: 32 | lm loss: 6.483904E+00 | loss scale: 16384.0 | grad norm: 117143.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4147/ 159576 | consumed samples: 85456 | elapsed time per iteration (ms): 14531.1 | learning rate: 2.367E-05 | global batch size: 32 | lm loss: 6.363456E+00 | loss scale: 16384.0 | grad norm: 88860.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4148/ 159576 | consumed samples: 85488 | elapsed time per iteration (ms): 14766.7 | learning rate: 2.368E-05 | global batch size: 32 | lm loss: 6.523079E+00 | loss scale: 16384.0 | grad norm: 87677.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4149/ 159576 | consumed samples: 85520 | elapsed time per iteration (ms): 14507.2 | learning rate: 2.368E-05 | global batch size: 32 | lm loss: 6.553520E+00 | loss scale: 16384.0 | grad norm: 121742.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4150/ 159576 | consumed samples: 85552 | elapsed time per iteration (ms): 14548.6 | learning rate: 2.369E-05 | global batch size: 32 | lm loss: 6.490498E+00 | loss scale: 16384.0 | grad norm: 89599.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4151/ 159576 | consumed samples: 85584 | elapsed time per iteration (ms): 14535.8 | learning rate: 2.370E-05 | global batch size: 32 | lm loss: 6.498284E+00 | loss scale: 16384.0 | grad norm: 103857.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4152/ 159576 | consumed samples: 85616 | elapsed time per iteration (ms): 14637.7 | learning rate: 2.371E-05 | global batch size: 32 | lm loss: 6.607250E+00 | loss scale: 16384.0 | grad norm: 80792.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4153/ 159576 | consumed samples: 85648 | elapsed time per iteration (ms): 14584.8 | learning rate: 2.372E-05 | global batch size: 32 | lm loss: 6.465719E+00 | loss scale: 16384.0 | grad norm: 76852.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4154/ 159576 | consumed samples: 85680 | elapsed time per iteration (ms): 14575.3 | learning rate: 2.373E-05 | global batch size: 32 | lm loss: 6.475266E+00 | loss scale: 16384.0 | grad norm: 87775.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4155/ 159576 | consumed samples: 85712 | elapsed time per iteration (ms): 14452.5 | learning rate: 2.374E-05 | global batch size: 32 | lm loss: 6.456027E+00 | loss scale: 16384.0 | grad norm: 75377.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4156/ 159576 | consumed samples: 85744 | elapsed time per iteration (ms): 14769.4 | learning rate: 2.375E-05 | global batch size: 32 | lm loss: 6.436621E+00 | loss scale: 16384.0 | grad norm: 86270.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4157/ 159576 | consumed samples: 85776 | elapsed time per iteration (ms): 14484.6 | learning rate: 2.376E-05 | global batch size: 32 | lm loss: 6.502521E+00 | loss scale: 16384.0 | grad norm: 77291.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4158/ 159576 | consumed samples: 85808 | elapsed time per iteration (ms): 14605.4 | learning rate: 2.376E-05 | global batch size: 32 | lm loss: 6.271915E+00 | loss scale: 16384.0 | grad norm: 79782.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4159/ 159576 | consumed samples: 85840 | elapsed time per iteration (ms): 14468.5 | learning rate: 2.377E-05 | global batch size: 32 | lm loss: 6.375775E+00 | loss scale: 16384.0 | grad norm: 91679.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4160/ 159576 | consumed samples: 85872 | elapsed time per iteration (ms): 15055.2 | learning rate: 2.378E-05 | global batch size: 32 | lm loss: 6.207356E+00 | loss scale: 16384.0 | grad norm: 84700.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4161/ 159576 | consumed samples: 85904 | elapsed time per iteration (ms): 14639.9 | learning rate: 2.379E-05 | global batch size: 32 | lm loss: 6.385208E+00 | loss scale: 16384.0 | grad norm: 77383.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4162/ 159576 | consumed samples: 85936 | elapsed time per iteration (ms): 14461.5 | learning rate: 2.380E-05 | global batch size: 32 | lm loss: 6.480938E+00 | loss scale: 16384.0 | grad norm: 98154.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4163/ 159576 | consumed samples: 85968 | elapsed time per iteration (ms): 14557.2 | learning rate: 2.381E-05 | global batch size: 32 | lm loss: 6.427241E+00 | loss scale: 16384.0 | grad norm: 79663.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4164/ 159576 | consumed samples: 86000 | elapsed time per iteration (ms): 15046.3 | learning rate: 2.382E-05 | global batch size: 32 | lm loss: 6.310709E+00 | loss scale: 16384.0 | grad norm: 76469.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4165/ 159576 | consumed samples: 86032 | elapsed time per iteration (ms): 14517.1 | learning rate: 2.383E-05 | global batch size: 32 | lm loss: 6.597423E+00 | loss scale: 16384.0 | grad norm: 95179.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4166/ 159576 | consumed samples: 86064 | elapsed time per iteration (ms): 14562.4 | learning rate: 2.384E-05 | global batch size: 32 | lm loss: 6.398317E+00 | loss scale: 16384.0 | grad norm: 86889.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4167/ 159576 | consumed samples: 86096 | elapsed time per iteration (ms): 14577.1 | learning rate: 2.384E-05 | global batch size: 32 | lm loss: 6.447660E+00 | loss scale: 16384.0 | grad norm: 99510.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4168/ 159576 | consumed samples: 86128 | elapsed time per iteration (ms): 14813.0 | learning rate: 2.385E-05 | global batch size: 32 | lm loss: 6.528482E+00 | loss scale: 16384.0 | grad norm: 83413.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4169/ 159576 | consumed samples: 86160 | elapsed time per iteration (ms): 14589.9 | learning rate: 2.386E-05 | global batch size: 32 | lm loss: 6.388697E+00 | loss scale: 16384.0 | grad norm: 76722.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4170/ 159576 | consumed samples: 86192 | elapsed time per iteration (ms): 14519.5 | learning rate: 2.387E-05 | global batch size: 32 | lm loss: 6.446240E+00 | loss scale: 16384.0 | grad norm: 85947.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4171/ 159576 | consumed samples: 86224 | elapsed time per iteration (ms): 14524.6 | learning rate: 2.388E-05 | global batch size: 32 | lm loss: 6.425363E+00 | loss scale: 16384.0 | grad norm: 88474.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4172/ 159576 | consumed samples: 86256 | elapsed time per iteration (ms): 14879.2 | learning rate: 2.389E-05 | global batch size: 32 | lm loss: 6.515138E+00 | loss scale: 16384.0 | grad norm: 108134.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4173/ 159576 | consumed samples: 86288 | elapsed time per iteration (ms): 14582.3 | learning rate: 2.390E-05 | global batch size: 32 | lm loss: 6.533965E+00 | loss scale: 16384.0 | grad norm: 76749.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4174/ 159576 | consumed samples: 86320 | elapsed time per iteration (ms): 14543.3 | learning rate: 2.391E-05 | global batch size: 32 | lm loss: 6.448212E+00 | loss scale: 16384.0 | grad norm: 93972.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4175/ 159576 | consumed samples: 86352 | elapsed time per iteration (ms): 14572.0 | learning rate: 2.392E-05 | global batch size: 32 | lm loss: 6.440217E+00 | loss scale: 16384.0 | grad norm: 102291.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4176/ 159576 | consumed samples: 86384 | elapsed time per iteration (ms): 14897.3 | learning rate: 2.392E-05 | global batch size: 32 | lm loss: 6.324600E+00 | loss scale: 16384.0 | grad norm: 81057.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4177/ 159576 | consumed samples: 86416 | elapsed time per iteration (ms): 14575.9 | learning rate: 2.393E-05 | global batch size: 32 | lm loss: 6.564878E+00 | loss scale: 16384.0 | grad norm: 96270.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4178/ 159576 | consumed samples: 86448 | elapsed time per iteration (ms): 14585.7 | learning rate: 2.394E-05 | global batch size: 32 | lm loss: 6.473108E+00 | loss scale: 16384.0 | grad norm: 80498.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4179/ 159576 | consumed samples: 86480 | elapsed time per iteration (ms): 14517.6 | learning rate: 2.395E-05 | global batch size: 32 | lm loss: 6.519761E+00 | loss scale: 16384.0 | grad norm: 90509.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4180/ 159576 | consumed samples: 86512 | elapsed time per iteration (ms): 14895.7 | learning rate: 2.396E-05 | global batch size: 32 | lm loss: 6.377243E+00 | loss scale: 16384.0 | grad norm: 92370.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4181/ 159576 | consumed samples: 86544 | elapsed time per iteration (ms): 14690.0 | learning rate: 2.397E-05 | global batch size: 32 | lm loss: 6.469300E+00 | loss scale: 16384.0 | grad norm: 89492.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4182/ 159576 | consumed samples: 86576 | elapsed time per iteration (ms): 14557.6 | learning rate: 2.398E-05 | global batch size: 32 | lm loss: 6.497668E+00 | loss scale: 16384.0 | grad norm: 104899.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4183/ 159576 | consumed samples: 86608 | elapsed time per iteration (ms): 14588.2 | learning rate: 2.399E-05 | global batch size: 32 | lm loss: 6.412446E+00 | loss scale: 16384.0 | grad norm: 81267.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4184/ 159576 | consumed samples: 86640 | elapsed time per iteration (ms): 14486.7 | learning rate: 2.400E-05 | global batch size: 32 | lm loss: 6.486274E+00 | loss scale: 16384.0 | grad norm: 95404.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4185/ 159576 | consumed samples: 86672 | elapsed time per iteration (ms): 14942.6 | learning rate: 2.400E-05 | global batch size: 32 | lm loss: 6.375100E+00 | loss scale: 16384.0 | grad norm: 82372.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4186/ 159576 | consumed samples: 86704 | elapsed time per iteration (ms): 14540.4 | learning rate: 2.401E-05 | global batch size: 32 | lm loss: 6.444688E+00 | loss scale: 16384.0 | grad norm: 102268.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4187/ 159576 | consumed samples: 86736 | elapsed time per iteration (ms): 14530.9 | learning rate: 2.402E-05 | global batch size: 32 | lm loss: 6.270885E+00 | loss scale: 16384.0 | grad norm: 85114.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4188/ 159576 | consumed samples: 86768 | elapsed time per iteration (ms): 14554.4 | learning rate: 2.403E-05 | global batch size: 32 | lm loss: 6.461191E+00 | loss scale: 16384.0 | grad norm: 82795.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4189/ 159576 | consumed samples: 86800 | elapsed time per iteration (ms): 14680.7 | learning rate: 2.404E-05 | global batch size: 32 | lm loss: 6.483377E+00 | loss scale: 16384.0 | grad norm: 106142.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4190/ 159576 | consumed samples: 86832 | elapsed time per iteration (ms): 14652.1 | learning rate: 2.405E-05 | global batch size: 32 | lm loss: 6.468819E+00 | loss scale: 16384.0 | grad norm: 83557.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4191/ 159576 | consumed samples: 86864 | elapsed time per iteration (ms): 14459.3 | learning rate: 2.406E-05 | global batch size: 32 | lm loss: 6.379012E+00 | loss scale: 16384.0 | grad norm: 90619.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4192/ 159576 | consumed samples: 86896 | elapsed time per iteration (ms): 14539.1 | learning rate: 2.407E-05 | global batch size: 32 | lm loss: 6.459314E+00 | loss scale: 16384.0 | grad norm: 94282.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4193/ 159576 | consumed samples: 86928 | elapsed time per iteration (ms): 14715.7 | learning rate: 2.408E-05 | global batch size: 32 | lm loss: 6.435170E+00 | loss scale: 16384.0 | grad norm: 92946.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4194/ 159576 | consumed samples: 86960 | elapsed time per iteration (ms): 14501.7 | learning rate: 2.408E-05 | global batch size: 32 | lm loss: 6.419791E+00 | loss scale: 16384.0 | grad norm: 78251.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4195/ 159576 | consumed samples: 86992 | elapsed time per iteration (ms): 14523.0 | learning rate: 2.409E-05 | global batch size: 32 | lm loss: 6.342591E+00 | loss scale: 16384.0 | grad norm: 80571.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4196/ 159576 | consumed samples: 87024 | elapsed time per iteration (ms): 14595.3 | learning rate: 2.410E-05 | global batch size: 32 | lm loss: 6.373145E+00 | loss scale: 16384.0 | grad norm: 106409.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4197/ 159576 | consumed samples: 87056 | elapsed time per iteration (ms): 14737.5 | learning rate: 2.411E-05 | global batch size: 32 | lm loss: 6.543087E+00 | loss scale: 16384.0 | grad norm: 81359.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4198/ 159576 | consumed samples: 87088 | elapsed time per iteration (ms): 14570.3 | learning rate: 2.412E-05 | global batch size: 32 | lm loss: 6.555972E+00 | loss scale: 16384.0 | grad norm: 101442.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4199/ 159576 | consumed samples: 87120 | elapsed time per iteration (ms): 14518.0 | learning rate: 2.413E-05 | global batch size: 32 | lm loss: 6.497987E+00 | loss scale: 16384.0 | grad norm: 87789.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4200/ 159576 | consumed samples: 87152 | elapsed time per iteration (ms): 14561.0 | learning rate: 2.414E-05 | global batch size: 32 | lm loss: 6.526636E+00 | loss scale: 16384.0 | grad norm: 97375.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4201/ 159576 | consumed samples: 87184 | elapsed time per iteration (ms): 14967.8 | learning rate: 2.415E-05 | global batch size: 32 | lm loss: 6.529594E+00 | loss scale: 16384.0 | grad norm: 98056.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4202/ 159576 | consumed samples: 87216 | elapsed time per iteration (ms): 14591.5 | learning rate: 2.416E-05 | global batch size: 32 | lm loss: 6.461559E+00 | loss scale: 16384.0 | grad norm: 103248.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4203/ 159576 | consumed samples: 87248 | elapsed time per iteration (ms): 14557.3 | learning rate: 2.416E-05 | global batch size: 32 | lm loss: 6.255905E+00 | loss scale: 16384.0 | grad norm: 98489.984 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4204/ 159576 | consumed samples: 87280 | elapsed time per iteration (ms): 14539.8 | learning rate: 2.417E-05 | global batch size: 32 | lm loss: 6.456792E+00 | loss scale: 16384.0 | grad norm: 90220.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4205/ 159576 | consumed samples: 87312 | elapsed time per iteration (ms): 14936.2 | learning rate: 2.418E-05 | global batch size: 32 | lm loss: 6.456956E+00 | loss scale: 16384.0 | grad norm: 99591.028 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4206/ 159576 | consumed samples: 87344 | elapsed time per iteration (ms): 14602.1 | learning rate: 2.419E-05 | global batch size: 32 | lm loss: 6.539675E+00 | loss scale: 16384.0 | grad norm: 106461.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4207/ 159576 | consumed samples: 87376 | elapsed time per iteration (ms): 14518.5 | learning rate: 2.420E-05 | global batch size: 32 | lm loss: 6.581583E+00 | loss scale: 16384.0 | grad norm: 104474.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4208/ 159576 | consumed samples: 87408 | elapsed time per iteration (ms): 14546.2 | learning rate: 2.421E-05 | global batch size: 32 | lm loss: 6.470299E+00 | loss scale: 16384.0 | grad norm: 103936.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4209/ 159576 | consumed samples: 87440 | elapsed time per iteration (ms): 14895.0 | learning rate: 2.422E-05 | global batch size: 32 | lm loss: 6.485046E+00 | loss scale: 16384.0 | grad norm: 103480.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4210/ 159576 | consumed samples: 87472 | elapsed time per iteration (ms): 14490.7 | learning rate: 2.423E-05 | global batch size: 32 | lm loss: 6.331614E+00 | loss scale: 16384.0 | grad norm: 92393.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4211/ 159576 | consumed samples: 87504 | elapsed time per iteration (ms): 14505.6 | learning rate: 2.424E-05 | global batch size: 32 | lm loss: 6.343493E+00 | loss scale: 16384.0 | grad norm: 138840.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4212/ 159576 | consumed samples: 87536 | elapsed time per iteration (ms): 14559.8 | learning rate: 2.424E-05 | global batch size: 32 | lm loss: 6.362164E+00 | loss scale: 16384.0 | grad norm: 105314.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4213/ 159576 | consumed samples: 87568 | elapsed time per iteration (ms): 14962.7 | learning rate: 2.425E-05 | global batch size: 32 | lm loss: 6.413978E+00 | loss scale: 16384.0 | grad norm: 100396.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4214/ 159576 | consumed samples: 87600 | elapsed time per iteration (ms): 14459.8 | learning rate: 2.426E-05 | global batch size: 32 | lm loss: 6.333343E+00 | loss scale: 16384.0 | grad norm: 101809.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4215/ 159576 | consumed samples: 87632 | elapsed time per iteration (ms): 14541.9 | learning rate: 2.427E-05 | global batch size: 32 | lm loss: 6.552740E+00 | loss scale: 16384.0 | grad norm: 198031.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4216/ 159576 | consumed samples: 87664 | elapsed time per iteration (ms): 14546.7 | learning rate: 2.428E-05 | global batch size: 32 | lm loss: 6.373903E+00 | loss scale: 16384.0 | grad norm: 98034.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4217/ 159576 | consumed samples: 87696 | elapsed time per iteration (ms): 14848.3 | learning rate: 2.429E-05 | global batch size: 32 | lm loss: 6.452424E+00 | loss scale: 16384.0 | grad norm: 267522.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4218/ 159576 | consumed samples: 87728 | elapsed time per iteration (ms): 14570.6 | learning rate: 2.430E-05 | global batch size: 32 | lm loss: 6.493920E+00 | loss scale: 16384.0 | grad norm: 121372.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4219/ 159576 | consumed samples: 87760 | elapsed time per iteration (ms): 14553.1 | learning rate: 2.431E-05 | global batch size: 32 | lm loss: 6.478834E+00 | loss scale: 16384.0 | grad norm: 112151.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4220/ 159576 | consumed samples: 87792 | elapsed time per iteration (ms): 14546.6 | learning rate: 2.432E-05 | global batch size: 32 | lm loss: 6.452081E+00 | loss scale: 16384.0 | grad norm: 164176.147 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4221/ 159576 | consumed samples: 87824 | elapsed time per iteration (ms): 14866.7 | learning rate: 2.432E-05 | global batch size: 32 | lm loss: 6.616721E+00 | loss scale: 16384.0 | grad norm: 88412.117 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4222/ 159576 | consumed samples: 87856 | elapsed time per iteration (ms): 14831.9 | learning rate: 2.433E-05 | global batch size: 32 | lm loss: 6.396004E+00 | loss scale: 16384.0 | grad norm: 116548.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4223/ 159576 | consumed samples: 87888 | elapsed time per iteration (ms): 14530.1 | learning rate: 2.434E-05 | global batch size: 32 | lm loss: 6.223457E+00 | loss scale: 16384.0 | grad norm: 151936.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4224/ 159576 | consumed samples: 87920 | elapsed time per iteration (ms): 14526.4 | learning rate: 2.435E-05 | global batch size: 32 | lm loss: 6.471479E+00 | loss scale: 16384.0 | grad norm: 107150.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4225/ 159576 | consumed samples: 87952 | elapsed time per iteration (ms): 14556.3 | learning rate: 2.436E-05 | global batch size: 32 | lm loss: 6.420123E+00 | loss scale: 16384.0 | grad norm: 118336.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4226/ 159576 | consumed samples: 87984 | elapsed time per iteration (ms): 14779.5 | learning rate: 2.437E-05 | global batch size: 32 | lm loss: 6.463729E+00 | loss scale: 16384.0 | grad norm: 105104.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4227/ 159576 | consumed samples: 88016 | elapsed time per iteration (ms): 14616.1 | learning rate: 2.438E-05 | global batch size: 32 | lm loss: 6.384348E+00 | loss scale: 16384.0 | grad norm: 121857.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4228/ 159576 | consumed samples: 88048 | elapsed time per iteration (ms): 14595.0 | learning rate: 2.439E-05 | global batch size: 32 | lm loss: 6.562186E+00 | loss scale: 16384.0 | grad norm: 120895.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4229/ 159576 | consumed samples: 88080 | elapsed time per iteration (ms): 14592.9 | learning rate: 2.439E-05 | global batch size: 32 | lm loss: 6.614166E+00 | loss scale: 16384.0 | grad norm: 141989.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4230/ 159576 | consumed samples: 88112 | elapsed time per iteration (ms): 14745.8 | learning rate: 2.440E-05 | global batch size: 32 | lm loss: 6.416856E+00 | loss scale: 16384.0 | grad norm: 135385.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4231/ 159576 | consumed samples: 88144 | elapsed time per iteration (ms): 14547.3 | learning rate: 2.441E-05 | global batch size: 32 | lm loss: 6.576384E+00 | loss scale: 16384.0 | grad norm: 129034.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4232/ 159576 | consumed samples: 88176 | elapsed time per iteration (ms): 14539.9 | learning rate: 2.442E-05 | global batch size: 32 | lm loss: 6.371499E+00 | loss scale: 16384.0 | grad norm: 102463.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4233/ 159576 | consumed samples: 88208 | elapsed time per iteration (ms): 14580.8 | learning rate: 2.443E-05 | global batch size: 32 | lm loss: 6.598085E+00 | loss scale: 16384.0 | grad norm: 105075.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4234/ 159576 | consumed samples: 88240 | elapsed time per iteration (ms): 14766.2 | learning rate: 2.444E-05 | global batch size: 32 | lm loss: 6.536204E+00 | loss scale: 16384.0 | grad norm: 109004.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4235/ 159576 | consumed samples: 88272 | elapsed time per iteration (ms): 14518.0 | learning rate: 2.445E-05 | global batch size: 32 | lm loss: 6.663161E+00 | loss scale: 16384.0 | grad norm: 197099.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4236/ 159576 | consumed samples: 88304 | elapsed time per iteration (ms): 14598.2 | learning rate: 2.446E-05 | global batch size: 32 | lm loss: 6.451008E+00 | loss scale: 16384.0 | grad norm: 125746.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4237/ 159576 | consumed samples: 88336 | elapsed time per iteration (ms): 14568.7 | learning rate: 2.447E-05 | global batch size: 32 | lm loss: 6.306778E+00 | loss scale: 16384.0 | grad norm: 145717.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4238/ 159576 | consumed samples: 88368 | elapsed time per iteration (ms): 14844.4 | learning rate: 2.447E-05 | global batch size: 32 | lm loss: 6.637146E+00 | loss scale: 16384.0 | grad norm: 161986.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4239/ 159576 | consumed samples: 88400 | elapsed time per iteration (ms): 14550.6 | learning rate: 2.448E-05 | global batch size: 32 | lm loss: 6.518569E+00 | loss scale: 16384.0 | grad norm: 114815.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4240/ 159576 | consumed samples: 88432 | elapsed time per iteration (ms): 14540.5 | learning rate: 2.449E-05 | global batch size: 32 | lm loss: 6.644086E+00 | loss scale: 16384.0 | grad norm: 127083.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4241/ 159576 | consumed samples: 88464 | elapsed time per iteration (ms): 14556.9 | learning rate: 2.450E-05 | global batch size: 32 | lm loss: 6.359149E+00 | loss scale: 16384.0 | grad norm: 119916.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4242/ 159576 | consumed samples: 88496 | elapsed time per iteration (ms): 14950.3 | learning rate: 2.451E-05 | global batch size: 32 | lm loss: 6.517668E+00 | loss scale: 16384.0 | grad norm: 116850.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4243/ 159576 | consumed samples: 88528 | elapsed time per iteration (ms): 14575.9 | learning rate: 2.452E-05 | global batch size: 32 | lm loss: 6.345152E+00 | loss scale: 16384.0 | grad norm: 106829.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4244/ 159576 | consumed samples: 88560 | elapsed time per iteration (ms): 14588.0 | learning rate: 2.453E-05 | global batch size: 32 | lm loss: 6.476923E+00 | loss scale: 16384.0 | grad norm: 121409.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4245/ 159576 | consumed samples: 88592 | elapsed time per iteration (ms): 14539.0 | learning rate: 2.454E-05 | global batch size: 32 | lm loss: 6.428369E+00 | loss scale: 16384.0 | grad norm: 99872.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4246/ 159576 | consumed samples: 88624 | elapsed time per iteration (ms): 15044.1 | learning rate: 2.455E-05 | global batch size: 32 | lm loss: 6.447415E+00 | loss scale: 16384.0 | grad norm: 102765.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4247/ 159576 | consumed samples: 88656 | elapsed time per iteration (ms): 14546.9 | learning rate: 2.455E-05 | global batch size: 32 | lm loss: 6.336578E+00 | loss scale: 16384.0 | grad norm: 90835.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4248/ 159576 | consumed samples: 88688 | elapsed time per iteration (ms): 14540.1 | learning rate: 2.456E-05 | global batch size: 32 | lm loss: 6.555513E+00 | loss scale: 16384.0 | grad norm: 104407.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4249/ 159576 | consumed samples: 88720 | elapsed time per iteration (ms): 14613.4 | learning rate: 2.457E-05 | global batch size: 32 | lm loss: 6.546042E+00 | loss scale: 16384.0 | grad norm: 115379.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4250/ 159576 | consumed samples: 88752 | elapsed time per iteration (ms): 14829.6 | learning rate: 2.458E-05 | global batch size: 32 | lm loss: 6.436588E+00 | loss scale: 16384.0 | grad norm: 107293.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4251/ 159576 | consumed samples: 88784 | elapsed time per iteration (ms): 14544.9 | learning rate: 2.459E-05 | global batch size: 32 | lm loss: 6.438442E+00 | loss scale: 16384.0 | grad norm: 105034.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4252/ 159576 | consumed samples: 88816 | elapsed time per iteration (ms): 14563.6 | learning rate: 2.460E-05 | global batch size: 32 | lm loss: 6.473608E+00 | loss scale: 16384.0 | grad norm: 84036.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4253/ 159576 | consumed samples: 88848 | elapsed time per iteration (ms): 14528.1 | learning rate: 2.461E-05 | global batch size: 32 | lm loss: 6.422614E+00 | loss scale: 16384.0 | grad norm: 95068.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4254/ 159576 | consumed samples: 88880 | elapsed time per iteration (ms): 14918.1 | learning rate: 2.462E-05 | global batch size: 32 | lm loss: 6.295578E+00 | loss scale: 16384.0 | grad norm: 114489.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4255/ 159576 | consumed samples: 88912 | elapsed time per iteration (ms): 14525.9 | learning rate: 2.463E-05 | global batch size: 32 | lm loss: 6.416272E+00 | loss scale: 16384.0 | grad norm: 91261.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4256/ 159576 | consumed samples: 88944 | elapsed time per iteration (ms): 14525.5 | learning rate: 2.463E-05 | global batch size: 32 | lm loss: 6.517479E+00 | loss scale: 32768.0 | grad norm: 94254.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4257/ 159576 | consumed samples: 88976 | elapsed time per iteration (ms): 14555.5 | learning rate: 2.464E-05 | global batch size: 32 | lm loss: 6.469455E+00 | loss scale: 32768.0 | grad norm: 174372.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4258/ 159576 | consumed samples: 89008 | elapsed time per iteration (ms): 14928.2 | learning rate: 2.465E-05 | global batch size: 32 | lm loss: 6.408867E+00 | loss scale: 32768.0 | grad norm: 205212.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4259/ 159576 | consumed samples: 89040 | elapsed time per iteration (ms): 14529.5 | learning rate: 2.466E-05 | global batch size: 32 | lm loss: 6.518348E+00 | loss scale: 32768.0 | grad norm: 175125.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4260/ 159576 | consumed samples: 89072 | elapsed time per iteration (ms): 14608.9 | learning rate: 2.467E-05 | global batch size: 32 | lm loss: 6.456366E+00 | loss scale: 32768.0 | grad norm: 180925.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4261/ 159576 | consumed samples: 89104 | elapsed time per iteration (ms): 14541.2 | learning rate: 2.468E-05 | global batch size: 32 | lm loss: 6.688640E+00 | loss scale: 32768.0 | grad norm: 205129.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4262/ 159576 | consumed samples: 89136 | elapsed time per iteration (ms): 14984.8 | learning rate: 2.469E-05 | global batch size: 32 | lm loss: 6.381848E+00 | loss scale: 32768.0 | grad norm: 194086.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4263/ 159576 | consumed samples: 89168 | elapsed time per iteration (ms): 14627.4 | learning rate: 2.470E-05 | global batch size: 32 | lm loss: 6.325251E+00 | loss scale: 32768.0 | grad norm: 200329.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4264/ 159576 | consumed samples: 89200 | elapsed time per iteration (ms): 14514.4 | learning rate: 2.471E-05 | global batch size: 32 | lm loss: 6.384187E+00 | loss scale: 32768.0 | grad norm: 206513.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4265/ 159576 | consumed samples: 89232 | elapsed time per iteration (ms): 14532.8 | learning rate: 2.471E-05 | global batch size: 32 | lm loss: 6.524798E+00 | loss scale: 32768.0 | grad norm: 207588.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4266/ 159576 | consumed samples: 89264 | elapsed time per iteration (ms): 14499.0 | learning rate: 2.472E-05 | global batch size: 32 | lm loss: 6.427965E+00 | loss scale: 32768.0 | grad norm: 270396.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4267/ 159576 | consumed samples: 89296 | elapsed time per iteration (ms): 14964.3 | learning rate: 2.473E-05 | global batch size: 32 | lm loss: 6.508441E+00 | loss scale: 32768.0 | grad norm: 256825.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4268/ 159576 | consumed samples: 89328 | elapsed time per iteration (ms): 14573.4 | learning rate: 2.474E-05 | global batch size: 32 | lm loss: 6.281446E+00 | loss scale: 32768.0 | grad norm: 175050.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4269/ 159576 | consumed samples: 89360 | elapsed time per iteration (ms): 14497.3 | learning rate: 2.475E-05 | global batch size: 32 | lm loss: 6.477619E+00 | loss scale: 32768.0 | grad norm: 194699.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4270/ 159576 | consumed samples: 89392 | elapsed time per iteration (ms): 14560.8 | learning rate: 2.476E-05 | global batch size: 32 | lm loss: 6.521669E+00 | loss scale: 32768.0 | grad norm: 204025.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4271/ 159576 | consumed samples: 89424 | elapsed time per iteration (ms): 14634.9 | learning rate: 2.477E-05 | global batch size: 32 | lm loss: 6.532991E+00 | loss scale: 32768.0 | grad norm: 218350.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4272/ 159576 | consumed samples: 89456 | elapsed time per iteration (ms): 14566.6 | learning rate: 2.478E-05 | global batch size: 32 | lm loss: 6.491451E+00 | loss scale: 32768.0 | grad norm: 196213.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4273/ 159576 | consumed samples: 89488 | elapsed time per iteration (ms): 14504.5 | learning rate: 2.479E-05 | global batch size: 32 | lm loss: 6.527338E+00 | loss scale: 32768.0 | grad norm: 254430.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4274/ 159576 | consumed samples: 89520 | elapsed time per iteration (ms): 14538.5 | learning rate: 2.479E-05 | global batch size: 32 | lm loss: 6.303001E+00 | loss scale: 32768.0 | grad norm: 189173.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4275/ 159576 | consumed samples: 89552 | elapsed time per iteration (ms): 14691.4 | learning rate: 2.480E-05 | global batch size: 32 | lm loss: 6.465518E+00 | loss scale: 32768.0 | grad norm: 266867.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4276/ 159576 | consumed samples: 89584 | elapsed time per iteration (ms): 14571.4 | learning rate: 2.481E-05 | global batch size: 32 | lm loss: 6.562708E+00 | loss scale: 32768.0 | grad norm: 213181.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4277/ 159576 | consumed samples: 89616 | elapsed time per iteration (ms): 14513.3 | learning rate: 2.482E-05 | global batch size: 32 | lm loss: 6.490031E+00 | loss scale: 32768.0 | grad norm: 200238.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4278/ 159576 | consumed samples: 89648 | elapsed time per iteration (ms): 14545.3 | learning rate: 2.483E-05 | global batch size: 32 | lm loss: 6.452188E+00 | loss scale: 32768.0 | grad norm: 209603.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4279/ 159576 | consumed samples: 89680 | elapsed time per iteration (ms): 14892.6 | learning rate: 2.484E-05 | global batch size: 32 | lm loss: 6.402837E+00 | loss scale: 32768.0 | grad norm: 213512.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4280/ 159576 | consumed samples: 89712 | elapsed time per iteration (ms): 14552.6 | learning rate: 2.485E-05 | global batch size: 32 | lm loss: 6.481530E+00 | loss scale: 32768.0 | grad norm: 218939.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4281/ 159576 | consumed samples: 89744 | elapsed time per iteration (ms): 14525.9 | learning rate: 2.486E-05 | global batch size: 32 | lm loss: 6.481557E+00 | loss scale: 32768.0 | grad norm: 211553.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4282/ 159576 | consumed samples: 89776 | elapsed time per iteration (ms): 14536.1 | learning rate: 2.487E-05 | global batch size: 32 | lm loss: 6.396571E+00 | loss scale: 32768.0 | grad norm: 200119.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4283/ 159576 | consumed samples: 89808 | elapsed time per iteration (ms): 14897.4 | learning rate: 2.487E-05 | global batch size: 32 | lm loss: 6.437448E+00 | loss scale: 32768.0 | grad norm: 211733.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4284/ 159576 | consumed samples: 89840 | elapsed time per iteration (ms): 14635.9 | learning rate: 2.488E-05 | global batch size: 32 | lm loss: 6.477830E+00 | loss scale: 32768.0 | grad norm: 273937.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4285/ 159576 | consumed samples: 89872 | elapsed time per iteration (ms): 14565.4 | learning rate: 2.489E-05 | global batch size: 32 | lm loss: 6.567824E+00 | loss scale: 32768.0 | grad norm: 210402.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4286/ 159576 | consumed samples: 89904 | elapsed time per iteration (ms): 14519.6 | learning rate: 2.490E-05 | global batch size: 32 | lm loss: 6.385768E+00 | loss scale: 32768.0 | grad norm: 203200.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4287/ 159576 | consumed samples: 89936 | elapsed time per iteration (ms): 14914.9 | learning rate: 2.491E-05 | global batch size: 32 | lm loss: 6.397992E+00 | loss scale: 32768.0 | grad norm: 182816.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4288/ 159576 | consumed samples: 89968 | elapsed time per iteration (ms): 14476.6 | learning rate: 2.492E-05 | global batch size: 32 | lm loss: 6.388610E+00 | loss scale: 32768.0 | grad norm: 199735.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4289/ 159576 | consumed samples: 90000 | elapsed time per iteration (ms): 14570.5 | learning rate: 2.493E-05 | global batch size: 32 | lm loss: 6.506209E+00 | loss scale: 32768.0 | grad norm: 206990.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4290/ 159576 | consumed samples: 90032 | elapsed time per iteration (ms): 14531.9 | learning rate: 2.494E-05 | global batch size: 32 | lm loss: 6.351604E+00 | loss scale: 32768.0 | grad norm: 204481.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4291/ 159576 | consumed samples: 90064 | elapsed time per iteration (ms): 14860.6 | learning rate: 2.495E-05 | global batch size: 32 | lm loss: 6.518882E+00 | loss scale: 32768.0 | grad norm: 236219.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4292/ 159576 | consumed samples: 90096 | elapsed time per iteration (ms): 14581.4 | learning rate: 2.495E-05 | global batch size: 32 | lm loss: 6.428777E+00 | loss scale: 32768.0 | grad norm: 187907.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4293/ 159576 | consumed samples: 90128 | elapsed time per iteration (ms): 14508.1 | learning rate: 2.496E-05 | global batch size: 32 | lm loss: 6.327142E+00 | loss scale: 32768.0 | grad norm: 204872.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4294/ 159576 | consumed samples: 90160 | elapsed time per iteration (ms): 14534.7 | learning rate: 2.497E-05 | global batch size: 32 | lm loss: 6.385339E+00 | loss scale: 32768.0 | grad norm: 233375.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4295/ 159576 | consumed samples: 90192 | elapsed time per iteration (ms): 14858.3 | learning rate: 2.498E-05 | global batch size: 32 | lm loss: 6.416627E+00 | loss scale: 32768.0 | grad norm: 222806.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4296/ 159576 | consumed samples: 90224 | elapsed time per iteration (ms): 14474.6 | learning rate: 2.499E-05 | global batch size: 32 | lm loss: 6.518059E+00 | loss scale: 32768.0 | grad norm: 226593.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4297/ 159576 | consumed samples: 90256 | elapsed time per iteration (ms): 14569.0 | learning rate: 2.500E-05 | global batch size: 32 | lm loss: 6.133147E+00 | loss scale: 32768.0 | grad norm: 267419.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4298/ 159576 | consumed samples: 90288 | elapsed time per iteration (ms): 14566.4 | learning rate: 2.501E-05 | global batch size: 32 | lm loss: 6.308548E+00 | loss scale: 32768.0 | grad norm: 204598.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4299/ 159576 | consumed samples: 90320 | elapsed time per iteration (ms): 14984.7 | learning rate: 2.502E-05 | global batch size: 32 | lm loss: 6.369866E+00 | loss scale: 32768.0 | grad norm: 221545.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4300/ 159576 | consumed samples: 90352 | elapsed time per iteration (ms): 14484.6 | learning rate: 2.503E-05 | global batch size: 32 | lm loss: 6.530766E+00 | loss scale: 32768.0 | grad norm: 267800.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4301/ 159576 | consumed samples: 90384 | elapsed time per iteration (ms): 14557.5 | learning rate: 2.503E-05 | global batch size: 32 | lm loss: 6.503004E+00 | loss scale: 32768.0 | grad norm: 228461.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4302/ 159576 | consumed samples: 90416 | elapsed time per iteration (ms): 14550.0 | learning rate: 2.504E-05 | global batch size: 32 | lm loss: 6.538440E+00 | loss scale: 32768.0 | grad norm: 190026.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4303/ 159576 | consumed samples: 90448 | elapsed time per iteration (ms): 14655.7 | learning rate: 2.505E-05 | global batch size: 32 | lm loss: 6.461242E+00 | loss scale: 32768.0 | grad norm: 211257.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4304/ 159576 | consumed samples: 90480 | elapsed time per iteration (ms): 14769.1 | learning rate: 2.506E-05 | global batch size: 32 | lm loss: 6.479248E+00 | loss scale: 32768.0 | grad norm: 198712.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4305/ 159576 | consumed samples: 90512 | elapsed time per iteration (ms): 14577.3 | learning rate: 2.507E-05 | global batch size: 32 | lm loss: 6.432651E+00 | loss scale: 32768.0 | grad norm: 206822.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4306/ 159576 | consumed samples: 90544 | elapsed time per iteration (ms): 14533.2 | learning rate: 2.508E-05 | global batch size: 32 | lm loss: 6.347961E+00 | loss scale: 32768.0 | grad norm: 195748.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4307/ 159576 | consumed samples: 90576 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.509E-05 | global batch size: 32 | lm loss: 6.507642E+00 | loss scale: 32768.0 | grad norm: 218663.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4308/ 159576 | consumed samples: 90608 | elapsed time per iteration (ms): 14732.7 | learning rate: 2.510E-05 | global batch size: 32 | lm loss: 6.541059E+00 | loss scale: 32768.0 | grad norm: 228970.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4309/ 159576 | consumed samples: 90640 | elapsed time per iteration (ms): 14469.9 | learning rate: 2.511E-05 | global batch size: 32 | lm loss: 6.424891E+00 | loss scale: 32768.0 | grad norm: 196198.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4310/ 159576 | consumed samples: 90672 | elapsed time per iteration (ms): 14508.3 | learning rate: 2.511E-05 | global batch size: 32 | lm loss: 6.490376E+00 | loss scale: 32768.0 | grad norm: 215960.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4311/ 159576 | consumed samples: 90704 | elapsed time per iteration (ms): 14508.3 | learning rate: 2.512E-05 | global batch size: 32 | lm loss: 6.488754E+00 | loss scale: 32768.0 | grad norm: 195374.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4312/ 159576 | consumed samples: 90736 | elapsed time per iteration (ms): 14753.9 | learning rate: 2.513E-05 | global batch size: 32 | lm loss: 6.448671E+00 | loss scale: 32768.0 | grad norm: 227732.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4313/ 159576 | consumed samples: 90768 | elapsed time per iteration (ms): 14571.8 | learning rate: 2.514E-05 | global batch size: 32 | lm loss: 6.500753E+00 | loss scale: 32768.0 | grad norm: 266264.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4314/ 159576 | consumed samples: 90800 | elapsed time per iteration (ms): 14601.7 | learning rate: 2.515E-05 | global batch size: 32 | lm loss: 6.454448E+00 | loss scale: 32768.0 | grad norm: 224312.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4315/ 159576 | consumed samples: 90832 | elapsed time per iteration (ms): 14520.9 | learning rate: 2.516E-05 | global batch size: 32 | lm loss: 6.340928E+00 | loss scale: 32768.0 | grad norm: 252168.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4316/ 159576 | consumed samples: 90864 | elapsed time per iteration (ms): 14650.6 | learning rate: 2.517E-05 | global batch size: 32 | lm loss: 6.524774E+00 | loss scale: 32768.0 | grad norm: 233060.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4317/ 159576 | consumed samples: 90896 | elapsed time per iteration (ms): 14507.8 | learning rate: 2.518E-05 | global batch size: 32 | lm loss: 6.526123E+00 | loss scale: 32768.0 | grad norm: 228145.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4318/ 159576 | consumed samples: 90928 | elapsed time per iteration (ms): 14505.6 | learning rate: 2.518E-05 | global batch size: 32 | lm loss: 6.554380E+00 | loss scale: 32768.0 | grad norm: 215247.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 19:07:09] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 19:07:09] PULSE: tr8-104B is running for 13:14:58 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 4319/ 159576 | consumed samples: 90960 | elapsed time per iteration (ms): 14496.4 | learning rate: 2.519E-05 | global batch size: 32 | lm loss: 6.312326E+00 | loss scale: 32768.0 | grad norm: 214751.055 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4320/ 159576 | consumed samples: 90992 | elapsed time per iteration (ms): 14941.6 | learning rate: 2.520E-05 | global batch size: 32 | lm loss: 6.452510E+00 | loss scale: 32768.0 | grad norm: 260142.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4321/ 159576 | consumed samples: 91024 | elapsed time per iteration (ms): 14618.7 | learning rate: 2.521E-05 | global batch size: 32 | lm loss: 6.420647E+00 | loss scale: 32768.0 | grad norm: 225655.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4322/ 159576 | consumed samples: 91056 | elapsed time per iteration (ms): 14566.6 | learning rate: 2.522E-05 | global batch size: 32 | lm loss: 6.402806E+00 | loss scale: 32768.0 | grad norm: 291928.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4323/ 159576 | consumed samples: 91088 | elapsed time per iteration (ms): 14498.7 | learning rate: 2.523E-05 | global batch size: 32 | lm loss: 6.391022E+00 | loss scale: 32768.0 | grad norm: 237551.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4324/ 159576 | consumed samples: 91120 | elapsed time per iteration (ms): 15211.7 | learning rate: 2.524E-05 | global batch size: 32 | lm loss: 6.430393E+00 | loss scale: 32768.0 | grad norm: 234733.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4325/ 159576 | consumed samples: 91152 | elapsed time per iteration (ms): 14439.1 | learning rate: 2.525E-05 | global batch size: 32 | lm loss: 6.406878E+00 | loss scale: 32768.0 | grad norm: 212091.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4326/ 159576 | consumed samples: 91184 | elapsed time per iteration (ms): 14533.1 | learning rate: 2.526E-05 | global batch size: 32 | lm loss: 6.439167E+00 | loss scale: 32768.0 | grad norm: 244000.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4327/ 159576 | consumed samples: 91216 | elapsed time per iteration (ms): 14508.9 | learning rate: 2.526E-05 | global batch size: 32 | lm loss: 6.334565E+00 | loss scale: 32768.0 | grad norm: 183767.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4328/ 159576 | consumed samples: 91248 | elapsed time per iteration (ms): 14921.5 | learning rate: 2.527E-05 | global batch size: 32 | lm loss: 6.456017E+00 | loss scale: 32768.0 | grad norm: 239736.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4329/ 159576 | consumed samples: 91280 | elapsed time per iteration (ms): 14572.2 | learning rate: 2.528E-05 | global batch size: 32 | lm loss: 6.367092E+00 | loss scale: 32768.0 | grad norm: 195126.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4330/ 159576 | consumed samples: 91312 | elapsed time per iteration (ms): 14531.1 | learning rate: 2.529E-05 | global batch size: 32 | lm loss: 6.383262E+00 | loss scale: 32768.0 | grad norm: 208256.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4331/ 159576 | consumed samples: 91344 | elapsed time per iteration (ms): 14591.9 | learning rate: 2.530E-05 | global batch size: 32 | lm loss: 6.502596E+00 | loss scale: 32768.0 | grad norm: 248824.057 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4332/ 159576 | consumed samples: 91376 | elapsed time per iteration (ms): 14794.2 | learning rate: 2.531E-05 | global batch size: 32 | lm loss: 6.386366E+00 | loss scale: 32768.0 | grad norm: 223413.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4333/ 159576 | consumed samples: 91408 | elapsed time per iteration (ms): 14447.8 | learning rate: 2.532E-05 | global batch size: 32 | lm loss: 6.470964E+00 | loss scale: 32768.0 | grad norm: 220869.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4334/ 159576 | consumed samples: 91440 | elapsed time per iteration (ms): 14523.5 | learning rate: 2.533E-05 | global batch size: 32 | lm loss: 6.423388E+00 | loss scale: 32768.0 | grad norm: 204896.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4335/ 159576 | consumed samples: 91472 | elapsed time per iteration (ms): 14548.8 | learning rate: 2.534E-05 | global batch size: 32 | lm loss: 6.516037E+00 | loss scale: 32768.0 | grad norm: 214455.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4336/ 159576 | consumed samples: 91504 | elapsed time per iteration (ms): 14925.7 | learning rate: 2.534E-05 | global batch size: 32 | lm loss: 6.420337E+00 | loss scale: 32768.0 | grad norm: 252272.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4337/ 159576 | consumed samples: 91536 | elapsed time per iteration (ms): 14576.6 | learning rate: 2.535E-05 | global batch size: 32 | lm loss: 6.464952E+00 | loss scale: 32768.0 | grad norm: 193893.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4338/ 159576 | consumed samples: 91568 | elapsed time per iteration (ms): 14502.1 | learning rate: 2.536E-05 | global batch size: 32 | lm loss: 6.492158E+00 | loss scale: 32768.0 | grad norm: 243709.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4339/ 159576 | consumed samples: 91600 | elapsed time per iteration (ms): 14503.5 | learning rate: 2.537E-05 | global batch size: 32 | lm loss: 6.239275E+00 | loss scale: 32768.0 | grad norm: 206242.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4340/ 159576 | consumed samples: 91632 | elapsed time per iteration (ms): 14881.4 | learning rate: 2.538E-05 | global batch size: 32 | lm loss: 6.484446E+00 | loss scale: 32768.0 | grad norm: 213552.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4341/ 159576 | consumed samples: 91664 | elapsed time per iteration (ms): 14651.1 | learning rate: 2.539E-05 | global batch size: 32 | lm loss: 6.419237E+00 | loss scale: 32768.0 | grad norm: 210520.111 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4342/ 159576 | consumed samples: 91696 | elapsed time per iteration (ms): 14512.3 | learning rate: 2.540E-05 | global batch size: 32 | lm loss: 6.452721E+00 | loss scale: 32768.0 | grad norm: 238634.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4343/ 159576 | consumed samples: 91728 | elapsed time per iteration (ms): 14558.7 | learning rate: 2.541E-05 | global batch size: 32 | lm loss: 6.347074E+00 | loss scale: 32768.0 | grad norm: 202447.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4344/ 159576 | consumed samples: 91760 | elapsed time per iteration (ms): 14594.4 | learning rate: 2.542E-05 | global batch size: 32 | lm loss: 6.520543E+00 | loss scale: 32768.0 | grad norm: 239073.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4345/ 159576 | consumed samples: 91792 | elapsed time per iteration (ms): 14908.5 | learning rate: 2.542E-05 | global batch size: 32 | lm loss: 6.421722E+00 | loss scale: 32768.0 | grad norm: 217284.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4346/ 159576 | consumed samples: 91824 | elapsed time per iteration (ms): 14533.0 | learning rate: 2.543E-05 | global batch size: 32 | lm loss: 6.272108E+00 | loss scale: 32768.0 | grad norm: 200271.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4347/ 159576 | consumed samples: 91856 | elapsed time per iteration (ms): 14569.7 | learning rate: 2.544E-05 | global batch size: 32 | lm loss: 6.532617E+00 | loss scale: 32768.0 | grad norm: 194761.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4348/ 159576 | consumed samples: 91888 | elapsed time per iteration (ms): 14475.9 | learning rate: 2.545E-05 | global batch size: 32 | lm loss: 6.471928E+00 | loss scale: 32768.0 | grad norm: 217213.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4349/ 159576 | consumed samples: 91920 | elapsed time per iteration (ms): 14760.6 | learning rate: 2.546E-05 | global batch size: 32 | lm loss: 6.416161E+00 | loss scale: 32768.0 | grad norm: 224313.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4350/ 159576 | consumed samples: 91952 | elapsed time per iteration (ms): 14554.3 | learning rate: 2.547E-05 | global batch size: 32 | lm loss: 6.550965E+00 | loss scale: 32768.0 | grad norm: 241887.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4351/ 159576 | consumed samples: 91984 | elapsed time per iteration (ms): 14563.9 | learning rate: 2.548E-05 | global batch size: 32 | lm loss: 6.496109E+00 | loss scale: 32768.0 | grad norm: 216683.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4352/ 159576 | consumed samples: 92016 | elapsed time per iteration (ms): 14514.3 | learning rate: 2.549E-05 | global batch size: 32 | lm loss: 6.359037E+00 | loss scale: 32768.0 | grad norm: 205500.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4353/ 159576 | consumed samples: 92048 | elapsed time per iteration (ms): 14703.1 | learning rate: 2.550E-05 | global batch size: 32 | lm loss: 6.333501E+00 | loss scale: 32768.0 | grad norm: 326501.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4354/ 159576 | consumed samples: 92080 | elapsed time per iteration (ms): 14558.2 | learning rate: 2.550E-05 | global batch size: 32 | lm loss: 6.455669E+00 | loss scale: 32768.0 | grad norm: 254904.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4355/ 159576 | consumed samples: 92112 | elapsed time per iteration (ms): 14511.5 | learning rate: 2.551E-05 | global batch size: 32 | lm loss: 6.509322E+00 | loss scale: 32768.0 | grad norm: 237041.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4356/ 159576 | consumed samples: 92144 | elapsed time per iteration (ms): 14539.0 | learning rate: 2.552E-05 | global batch size: 32 | lm loss: 6.356802E+00 | loss scale: 32768.0 | grad norm: 268871.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4357/ 159576 | consumed samples: 92176 | elapsed time per iteration (ms): 14822.4 | learning rate: 2.553E-05 | global batch size: 32 | lm loss: 6.599571E+00 | loss scale: 32768.0 | grad norm: 283473.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4358/ 159576 | consumed samples: 92208 | elapsed time per iteration (ms): 14612.7 | learning rate: 2.554E-05 | global batch size: 32 | lm loss: 6.308304E+00 | loss scale: 32768.0 | grad norm: 231784.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4359/ 159576 | consumed samples: 92240 | elapsed time per iteration (ms): 14524.9 | learning rate: 2.555E-05 | global batch size: 32 | lm loss: 6.395612E+00 | loss scale: 32768.0 | grad norm: 270045.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4360/ 159576 | consumed samples: 92272 | elapsed time per iteration (ms): 14601.7 | learning rate: 2.556E-05 | global batch size: 32 | lm loss: 6.525626E+00 | loss scale: 32768.0 | grad norm: 275256.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4361/ 159576 | consumed samples: 92304 | elapsed time per iteration (ms): 14951.2 | learning rate: 2.557E-05 | global batch size: 32 | lm loss: 6.457727E+00 | loss scale: 32768.0 | grad norm: 277346.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4362/ 159576 | consumed samples: 92336 | elapsed time per iteration (ms): 14507.2 | learning rate: 2.558E-05 | global batch size: 32 | lm loss: 6.423290E+00 | loss scale: 32768.0 | grad norm: 259149.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4363/ 159576 | consumed samples: 92368 | elapsed time per iteration (ms): 14519.9 | learning rate: 2.558E-05 | global batch size: 32 | lm loss: 6.385529E+00 | loss scale: 32768.0 | grad norm: 288729.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4364/ 159576 | consumed samples: 92400 | elapsed time per iteration (ms): 14590.0 | learning rate: 2.559E-05 | global batch size: 32 | lm loss: 6.344237E+00 | loss scale: 32768.0 | grad norm: 224867.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4365/ 159576 | consumed samples: 92432 | elapsed time per iteration (ms): 15022.1 | learning rate: 2.560E-05 | global batch size: 32 | lm loss: 6.361878E+00 | loss scale: 32768.0 | grad norm: 317761.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4366/ 159576 | consumed samples: 92464 | elapsed time per iteration (ms): 14751.4 | learning rate: 2.561E-05 | global batch size: 32 | lm loss: 6.330537E+00 | loss scale: 32768.0 | grad norm: 265015.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4367/ 159576 | consumed samples: 92496 | elapsed time per iteration (ms): 14614.0 | learning rate: 2.562E-05 | global batch size: 32 | lm loss: 6.148376E+00 | loss scale: 32768.0 | grad norm: 264202.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4368/ 159576 | consumed samples: 92528 | elapsed time per iteration (ms): 14584.5 | learning rate: 2.563E-05 | global batch size: 32 | lm loss: 6.479382E+00 | loss scale: 32768.0 | grad norm: 264375.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4369/ 159576 | consumed samples: 92560 | elapsed time per iteration (ms): 14918.5 | learning rate: 2.564E-05 | global batch size: 32 | lm loss: 6.363014E+00 | loss scale: 32768.0 | grad norm: 226102.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4370/ 159576 | consumed samples: 92592 | elapsed time per iteration (ms): 14489.4 | learning rate: 2.565E-05 | global batch size: 32 | lm loss: 6.437625E+00 | loss scale: 32768.0 | grad norm: 280139.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4371/ 159576 | consumed samples: 92624 | elapsed time per iteration (ms): 14515.3 | learning rate: 2.566E-05 | global batch size: 32 | lm loss: 6.394330E+00 | loss scale: 32768.0 | grad norm: 290041.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4372/ 159576 | consumed samples: 92656 | elapsed time per iteration (ms): 14519.6 | learning rate: 2.566E-05 | global batch size: 32 | lm loss: 6.430163E+00 | loss scale: 32768.0 | grad norm: 318528.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4373/ 159576 | consumed samples: 92688 | elapsed time per iteration (ms): 14816.9 | learning rate: 2.567E-05 | global batch size: 32 | lm loss: 6.494810E+00 | loss scale: 32768.0 | grad norm: 279939.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4374/ 159576 | consumed samples: 92720 | elapsed time per iteration (ms): 14615.4 | learning rate: 2.568E-05 | global batch size: 32 | lm loss: 6.431265E+00 | loss scale: 32768.0 | grad norm: 260943.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4375/ 159576 | consumed samples: 92752 | elapsed time per iteration (ms): 14539.2 | learning rate: 2.569E-05 | global batch size: 32 | lm loss: 6.365846E+00 | loss scale: 32768.0 | grad norm: 614516.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4376/ 159576 | consumed samples: 92784 | elapsed time per iteration (ms): 14560.9 | learning rate: 2.570E-05 | global batch size: 32 | lm loss: 6.306572E+00 | loss scale: 32768.0 | grad norm: 303539.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4377/ 159576 | consumed samples: 92816 | elapsed time per iteration (ms): 14894.6 | learning rate: 2.571E-05 | global batch size: 32 | lm loss: 6.444806E+00 | loss scale: 32768.0 | grad norm: 305405.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4378/ 159576 | consumed samples: 92848 | elapsed time per iteration (ms): 14498.0 | learning rate: 2.572E-05 | global batch size: 32 | lm loss: 6.475850E+00 | loss scale: 32768.0 | grad norm: 302245.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4379/ 159576 | consumed samples: 92880 | elapsed time per iteration (ms): 14519.5 | learning rate: 2.573E-05 | global batch size: 32 | lm loss: 6.470803E+00 | loss scale: 32768.0 | grad norm: 302163.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4380/ 159576 | consumed samples: 92912 | elapsed time per iteration (ms): 14547.1 | learning rate: 2.574E-05 | global batch size: 32 | lm loss: 6.285831E+00 | loss scale: 32768.0 | grad norm: 245533.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4381/ 159576 | consumed samples: 92944 | elapsed time per iteration (ms): 14903.6 | learning rate: 2.574E-05 | global batch size: 32 | lm loss: 6.382543E+00 | loss scale: 32768.0 | grad norm: 256847.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4382/ 159576 | consumed samples: 92976 | elapsed time per iteration (ms): 14746.3 | learning rate: 2.575E-05 | global batch size: 32 | lm loss: 6.377112E+00 | loss scale: 32768.0 | grad norm: 234822.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4383/ 159576 | consumed samples: 93008 | elapsed time per iteration (ms): 14580.0 | learning rate: 2.576E-05 | global batch size: 32 | lm loss: 6.412641E+00 | loss scale: 32768.0 | grad norm: 343040.768 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4384/ 159576 | consumed samples: 93040 | elapsed time per iteration (ms): 14506.7 | learning rate: 2.577E-05 | global batch size: 32 | lm loss: 6.416348E+00 | loss scale: 32768.0 | grad norm: 291818.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4385/ 159576 | consumed samples: 93072 | elapsed time per iteration (ms): 14512.2 | learning rate: 2.578E-05 | global batch size: 32 | lm loss: 6.425752E+00 | loss scale: 32768.0 | grad norm: 323662.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4386/ 159576 | consumed samples: 93104 | elapsed time per iteration (ms): 14928.6 | learning rate: 2.579E-05 | global batch size: 32 | lm loss: 6.318911E+00 | loss scale: 32768.0 | grad norm: 305616.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4387/ 159576 | consumed samples: 93136 | elapsed time per iteration (ms): 14506.3 | learning rate: 2.580E-05 | global batch size: 32 | lm loss: 6.531947E+00 | loss scale: 32768.0 | grad norm: 350201.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4388/ 159576 | consumed samples: 93168 | elapsed time per iteration (ms): 14556.8 | learning rate: 2.581E-05 | global batch size: 32 | lm loss: 6.376329E+00 | loss scale: 32768.0 | grad norm: 345044.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4389/ 159576 | consumed samples: 93200 | elapsed time per iteration (ms): 14537.0 | learning rate: 2.582E-05 | global batch size: 32 | lm loss: 6.381351E+00 | loss scale: 32768.0 | grad norm: 285108.825 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4390/ 159576 | consumed samples: 93232 | elapsed time per iteration (ms): 14792.9 | learning rate: 2.582E-05 | global batch size: 32 | lm loss: 6.367733E+00 | loss scale: 32768.0 | grad norm: 443607.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4391/ 159576 | consumed samples: 93264 | elapsed time per iteration (ms): 14536.7 | learning rate: 2.583E-05 | global batch size: 32 | lm loss: 6.404822E+00 | loss scale: 32768.0 | grad norm: 266018.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4392/ 159576 | consumed samples: 93296 | elapsed time per iteration (ms): 14465.3 | learning rate: 2.584E-05 | global batch size: 32 | lm loss: 6.460493E+00 | loss scale: 32768.0 | grad norm: 388305.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4393/ 159576 | consumed samples: 93328 | elapsed time per iteration (ms): 14549.7 | learning rate: 2.585E-05 | global batch size: 32 | lm loss: 6.312160E+00 | loss scale: 32768.0 | grad norm: 289444.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4394/ 159576 | consumed samples: 93360 | elapsed time per iteration (ms): 14712.4 | learning rate: 2.586E-05 | global batch size: 32 | lm loss: 6.447091E+00 | loss scale: 32768.0 | grad norm: 310866.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4395/ 159576 | consumed samples: 93392 | elapsed time per iteration (ms): 14507.9 | learning rate: 2.587E-05 | global batch size: 32 | lm loss: 6.358830E+00 | loss scale: 32768.0 | grad norm: 254147.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4396/ 159576 | consumed samples: 93424 | elapsed time per iteration (ms): 14549.6 | learning rate: 2.588E-05 | global batch size: 32 | lm loss: 6.406147E+00 | loss scale: 32768.0 | grad norm: 368220.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4397/ 159576 | consumed samples: 93456 | elapsed time per iteration (ms): 14535.1 | learning rate: 2.589E-05 | global batch size: 32 | lm loss: 6.511951E+00 | loss scale: 32768.0 | grad norm: 306021.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4398/ 159576 | consumed samples: 93488 | elapsed time per iteration (ms): 14834.9 | learning rate: 2.589E-05 | global batch size: 32 | lm loss: 6.344939E+00 | loss scale: 32768.0 | grad norm: 244440.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4399/ 159576 | consumed samples: 93520 | elapsed time per iteration (ms): 14561.9 | learning rate: 2.590E-05 | global batch size: 32 | lm loss: 6.408576E+00 | loss scale: 32768.0 | grad norm: 331789.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4400/ 159576 | consumed samples: 93552 | elapsed time per iteration (ms): 14527.0 | learning rate: 2.591E-05 | global batch size: 32 | lm loss: 6.405599E+00 | loss scale: 32768.0 | grad norm: 389927.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4401/ 159576 | consumed samples: 93584 | elapsed time per iteration (ms): 14530.9 | learning rate: 2.592E-05 | global batch size: 32 | lm loss: 6.461980E+00 | loss scale: 32768.0 | grad norm: 344518.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4402/ 159576 | consumed samples: 93616 | elapsed time per iteration (ms): 15042.1 | learning rate: 2.593E-05 | global batch size: 32 | lm loss: 6.416601E+00 | loss scale: 32768.0 | grad norm: 310590.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4403/ 159576 | consumed samples: 93648 | elapsed time per iteration (ms): 14634.8 | learning rate: 2.594E-05 | global batch size: 32 | lm loss: 6.546180E+00 | loss scale: 32768.0 | grad norm: 267385.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4404/ 159576 | consumed samples: 93680 | elapsed time per iteration (ms): 14549.2 | learning rate: 2.595E-05 | global batch size: 32 | lm loss: 6.399436E+00 | loss scale: 32768.0 | grad norm: 298662.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4405/ 159576 | consumed samples: 93712 | elapsed time per iteration (ms): 14489.5 | learning rate: 2.596E-05 | global batch size: 32 | lm loss: 6.306044E+00 | loss scale: 32768.0 | grad norm: 302499.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4406/ 159576 | consumed samples: 93744 | elapsed time per iteration (ms): 14963.1 | learning rate: 2.597E-05 | global batch size: 32 | lm loss: 6.504598E+00 | loss scale: 32768.0 | grad norm: 315577.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4407/ 159576 | consumed samples: 93776 | elapsed time per iteration (ms): 14516.0 | learning rate: 2.597E-05 | global batch size: 32 | lm loss: 6.229925E+00 | loss scale: 32768.0 | grad norm: 238182.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4408/ 159576 | consumed samples: 93808 | elapsed time per iteration (ms): 14496.6 | learning rate: 2.598E-05 | global batch size: 32 | lm loss: 6.414362E+00 | loss scale: 32768.0 | grad norm: 274509.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4409/ 159576 | consumed samples: 93840 | elapsed time per iteration (ms): 14543.5 | learning rate: 2.599E-05 | global batch size: 32 | lm loss: 6.355350E+00 | loss scale: 32768.0 | grad norm: 288329.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4410/ 159576 | consumed samples: 93872 | elapsed time per iteration (ms): 14875.5 | learning rate: 2.600E-05 | global batch size: 32 | lm loss: 6.366935E+00 | loss scale: 32768.0 | grad norm: 252983.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4411/ 159576 | consumed samples: 93904 | elapsed time per iteration (ms): 14456.2 | learning rate: 2.601E-05 | global batch size: 32 | lm loss: 6.458515E+00 | loss scale: 32768.0 | grad norm: 210575.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4412/ 159576 | consumed samples: 93936 | elapsed time per iteration (ms): 14560.7 | learning rate: 2.602E-05 | global batch size: 32 | lm loss: 6.472146E+00 | loss scale: 32768.0 | grad norm: 237114.094 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4413/ 159576 | consumed samples: 93968 | elapsed time per iteration (ms): 14587.5 | learning rate: 2.603E-05 | global batch size: 32 | lm loss: 6.359771E+00 | loss scale: 32768.0 | grad norm: 252911.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4414/ 159576 | consumed samples: 94000 | elapsed time per iteration (ms): 14804.6 | learning rate: 2.604E-05 | global batch size: 32 | lm loss: 6.563889E+00 | loss scale: 32768.0 | grad norm: 296794.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4415/ 159576 | consumed samples: 94032 | elapsed time per iteration (ms): 14512.9 | learning rate: 2.605E-05 | global batch size: 32 | lm loss: 6.413787E+00 | loss scale: 32768.0 | grad norm: 272034.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4416/ 159576 | consumed samples: 94064 | elapsed time per iteration (ms): 14494.5 | learning rate: 2.605E-05 | global batch size: 32 | lm loss: 6.443899E+00 | loss scale: 32768.0 | grad norm: 290284.950 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4417/ 159576 | consumed samples: 94096 | elapsed time per iteration (ms): 14536.8 | learning rate: 2.606E-05 | global batch size: 32 | lm loss: 6.472334E+00 | loss scale: 32768.0 | grad norm: 248961.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4418/ 159576 | consumed samples: 94128 | elapsed time per iteration (ms): 14975.6 | learning rate: 2.607E-05 | global batch size: 32 | lm loss: 6.557878E+00 | loss scale: 32768.0 | grad norm: 330814.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4419/ 159576 | consumed samples: 94160 | elapsed time per iteration (ms): 14477.8 | learning rate: 2.608E-05 | global batch size: 32 | lm loss: 6.499488E+00 | loss scale: 32768.0 | grad norm: 268804.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4420/ 159576 | consumed samples: 94192 | elapsed time per iteration (ms): 14628.8 | learning rate: 2.609E-05 | global batch size: 32 | lm loss: 6.312944E+00 | loss scale: 32768.0 | grad norm: 264253.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4421/ 159576 | consumed samples: 94224 | elapsed time per iteration (ms): 14519.9 | learning rate: 2.610E-05 | global batch size: 32 | lm loss: 6.392362E+00 | loss scale: 32768.0 | grad norm: 255470.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4422/ 159576 | consumed samples: 94256 | elapsed time per iteration (ms): 14805.5 | learning rate: 2.611E-05 | global batch size: 32 | lm loss: 6.375703E+00 | loss scale: 32768.0 | grad norm: 246267.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4423/ 159576 | consumed samples: 94288 | elapsed time per iteration (ms): 14680.3 | learning rate: 2.612E-05 | global batch size: 32 | lm loss: 6.523773E+00 | loss scale: 32768.0 | grad norm: 281090.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4424/ 159576 | consumed samples: 94320 | elapsed time per iteration (ms): 7706.4 | learning rate: 2.612E-05 | global batch size: 32 | lm loss: 6.355268E+00 | loss scale: 32768.0 | grad norm: 281090.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4425/ 159576 | consumed samples: 94352 | elapsed time per iteration (ms): 13992.5 | learning rate: 2.613E-05 | global batch size: 32 | lm loss: 6.391113E+00 | loss scale: 32768.0 | grad norm: 235806.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4426/ 159576 | consumed samples: 94384 | elapsed time per iteration (ms): 14643.4 | learning rate: 2.613E-05 | global batch size: 32 | lm loss: 6.483145E+00 | loss scale: 32768.0 | grad norm: 316001.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4427/ 159576 | consumed samples: 94416 | elapsed time per iteration (ms): 14931.0 | learning rate: 2.614E-05 | global batch size: 32 | lm loss: 6.419625E+00 | loss scale: 32768.0 | grad norm: 595148.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4428/ 159576 | consumed samples: 94448 | elapsed time per iteration (ms): 14542.3 | learning rate: 2.615E-05 | global batch size: 32 | lm loss: 6.463273E+00 | loss scale: 32768.0 | grad norm: 310708.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4429/ 159576 | consumed samples: 94480 | elapsed time per iteration (ms): 14522.5 | learning rate: 2.616E-05 | global batch size: 32 | lm loss: 6.427548E+00 | loss scale: 32768.0 | grad norm: 324018.149 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4430/ 159576 | consumed samples: 94512 | elapsed time per iteration (ms): 14489.9 | learning rate: 2.617E-05 | global batch size: 32 | lm loss: 6.385033E+00 | loss scale: 32768.0 | grad norm: 244981.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4431/ 159576 | consumed samples: 94560 | elapsed time per iteration (ms): 15763.7 | learning rate: 2.618E-05 | global batch size: 48 | lm loss: 6.545300E+00 | loss scale: 32768.0 | grad norm: 209680.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4432/ 159576 | consumed samples: 94608 | elapsed time per iteration (ms): 15487.4 | learning rate: 2.620E-05 | global batch size: 48 | lm loss: 6.439948E+00 | loss scale: 32768.0 | grad norm: 242738.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4433/ 159576 | consumed samples: 94656 | elapsed time per iteration (ms): 15516.6 | learning rate: 2.621E-05 | global batch size: 48 | lm loss: 6.392755E+00 | loss scale: 32768.0 | grad norm: 221617.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4434/ 159576 | consumed samples: 94704 | elapsed time per iteration (ms): 15531.5 | learning rate: 2.622E-05 | global batch size: 48 | lm loss: 6.430658E+00 | loss scale: 32768.0 | grad norm: 237786.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4435/ 159576 | consumed samples: 94752 | elapsed time per iteration (ms): 15905.6 | learning rate: 2.624E-05 | global batch size: 48 | lm loss: 6.556681E+00 | loss scale: 32768.0 | grad norm: 268817.064 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4436/ 159576 | consumed samples: 94800 | elapsed time per iteration (ms): 15557.4 | learning rate: 2.625E-05 | global batch size: 48 | lm loss: 6.284402E+00 | loss scale: 32768.0 | grad norm: 217583.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4437/ 159576 | consumed samples: 94848 | elapsed time per iteration (ms): 15418.7 | learning rate: 2.626E-05 | global batch size: 48 | lm loss: 6.449813E+00 | loss scale: 32768.0 | grad norm: 250831.113 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4438/ 159576 | consumed samples: 94896 | elapsed time per iteration (ms): 15465.2 | learning rate: 2.628E-05 | global batch size: 48 | lm loss: 6.524204E+00 | loss scale: 32768.0 | grad norm: 237741.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4439/ 159576 | consumed samples: 94944 | elapsed time per iteration (ms): 15664.4 | learning rate: 2.629E-05 | global batch size: 48 | lm loss: 6.426958E+00 | loss scale: 32768.0 | grad norm: 275670.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4440/ 159576 | consumed samples: 94992 | elapsed time per iteration (ms): 15485.6 | learning rate: 2.630E-05 | global batch size: 48 | lm loss: 6.312765E+00 | loss scale: 32768.0 | grad norm: 236643.110 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4441/ 159576 | consumed samples: 95040 | elapsed time per iteration (ms): 15554.2 | learning rate: 2.632E-05 | global batch size: 48 | lm loss: 6.353696E+00 | loss scale: 32768.0 | grad norm: 244108.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4442/ 159576 | consumed samples: 95088 | elapsed time per iteration (ms): 15559.7 | learning rate: 2.633E-05 | global batch size: 48 | lm loss: 6.390371E+00 | loss scale: 32768.0 | grad norm: 415315.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4443/ 159576 | consumed samples: 95136 | elapsed time per iteration (ms): 15762.5 | learning rate: 2.634E-05 | global batch size: 48 | lm loss: 6.406565E+00 | loss scale: 32768.0 | grad norm: 379916.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4444/ 159576 | consumed samples: 95184 | elapsed time per iteration (ms): 15453.3 | learning rate: 2.636E-05 | global batch size: 48 | lm loss: 6.429417E+00 | loss scale: 32768.0 | grad norm: 221219.524 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4445/ 159576 | consumed samples: 95232 | elapsed time per iteration (ms): 15417.8 | learning rate: 2.637E-05 | global batch size: 48 | lm loss: 6.443903E+00 | loss scale: 32768.0 | grad norm: 296633.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4446/ 159576 | consumed samples: 95280 | elapsed time per iteration (ms): 15443.7 | learning rate: 2.638E-05 | global batch size: 48 | lm loss: 6.532698E+00 | loss scale: 32768.0 | grad norm: 269367.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4447/ 159576 | consumed samples: 95328 | elapsed time per iteration (ms): 15690.5 | learning rate: 2.640E-05 | global batch size: 48 | lm loss: 6.390007E+00 | loss scale: 32768.0 | grad norm: 235234.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4448/ 159576 | consumed samples: 95376 | elapsed time per iteration (ms): 15488.0 | learning rate: 2.641E-05 | global batch size: 48 | lm loss: 6.393896E+00 | loss scale: 32768.0 | grad norm: 210963.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4449/ 159576 | consumed samples: 95424 | elapsed time per iteration (ms): 15546.6 | learning rate: 2.642E-05 | global batch size: 48 | lm loss: 6.387472E+00 | loss scale: 32768.0 | grad norm: 214989.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4450/ 159576 | consumed samples: 95472 | elapsed time per iteration (ms): 15940.5 | learning rate: 2.644E-05 | global batch size: 48 | lm loss: 6.395288E+00 | loss scale: 32768.0 | grad norm: 214649.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4451/ 159576 | consumed samples: 95520 | elapsed time per iteration (ms): 15450.6 | learning rate: 2.645E-05 | global batch size: 48 | lm loss: 6.391924E+00 | loss scale: 32768.0 | grad norm: 256872.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4452/ 159576 | consumed samples: 95568 | elapsed time per iteration (ms): 15411.8 | learning rate: 2.646E-05 | global batch size: 48 | lm loss: 6.372116E+00 | loss scale: 32768.0 | grad norm: 227618.006 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4453/ 159576 | consumed samples: 95616 | elapsed time per iteration (ms): 15430.5 | learning rate: 2.648E-05 | global batch size: 48 | lm loss: 6.411846E+00 | loss scale: 32768.0 | grad norm: 239941.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4454/ 159576 | consumed samples: 95664 | elapsed time per iteration (ms): 15763.6 | learning rate: 2.649E-05 | global batch size: 48 | lm loss: 6.412562E+00 | loss scale: 32768.0 | grad norm: 229907.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4455/ 159576 | consumed samples: 95712 | elapsed time per iteration (ms): 15524.7 | learning rate: 2.650E-05 | global batch size: 48 | lm loss: 6.428136E+00 | loss scale: 32768.0 | grad norm: 223866.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4456/ 159576 | consumed samples: 95760 | elapsed time per iteration (ms): 15490.3 | learning rate: 2.652E-05 | global batch size: 48 | lm loss: 6.476852E+00 | loss scale: 32768.0 | grad norm: 263813.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4457/ 159576 | consumed samples: 95808 | elapsed time per iteration (ms): 15514.4 | learning rate: 2.653E-05 | global batch size: 48 | lm loss: 6.382901E+00 | loss scale: 32768.0 | grad norm: 257590.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4458/ 159576 | consumed samples: 95856 | elapsed time per iteration (ms): 15907.9 | learning rate: 2.654E-05 | global batch size: 48 | lm loss: 6.444118E+00 | loss scale: 32768.0 | grad norm: 236507.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4459/ 159576 | consumed samples: 95904 | elapsed time per iteration (ms): 15454.4 | learning rate: 2.656E-05 | global batch size: 48 | lm loss: 6.392717E+00 | loss scale: 32768.0 | grad norm: 227300.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4460/ 159576 | consumed samples: 95952 | elapsed time per iteration (ms): 15435.7 | learning rate: 2.657E-05 | global batch size: 48 | lm loss: 6.375526E+00 | loss scale: 32768.0 | grad norm: 217329.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4461/ 159576 | consumed samples: 96000 | elapsed time per iteration (ms): 15463.0 | learning rate: 2.658E-05 | global batch size: 48 | lm loss: 6.442908E+00 | loss scale: 32768.0 | grad norm: 210214.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4462/ 159576 | consumed samples: 96048 | elapsed time per iteration (ms): 15890.8 | learning rate: 2.660E-05 | global batch size: 48 | lm loss: 6.347652E+00 | loss scale: 32768.0 | grad norm: 241592.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4463/ 159576 | consumed samples: 96096 | elapsed time per iteration (ms): 15523.3 | learning rate: 2.661E-05 | global batch size: 48 | lm loss: 6.408596E+00 | loss scale: 32768.0 | grad norm: 286741.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4464/ 159576 | consumed samples: 96144 | elapsed time per iteration (ms): 15484.1 | learning rate: 2.662E-05 | global batch size: 48 | lm loss: 6.423483E+00 | loss scale: 32768.0 | grad norm: 227347.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4465/ 159576 | consumed samples: 96192 | elapsed time per iteration (ms): 15505.4 | learning rate: 2.664E-05 | global batch size: 48 | lm loss: 6.465323E+00 | loss scale: 32768.0 | grad norm: 278891.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4466/ 159576 | consumed samples: 96240 | elapsed time per iteration (ms): 15734.3 | learning rate: 2.665E-05 | global batch size: 48 | lm loss: 6.540909E+00 | loss scale: 32768.0 | grad norm: 271330.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4467/ 159576 | consumed samples: 96288 | elapsed time per iteration (ms): 15463.2 | learning rate: 2.666E-05 | global batch size: 48 | lm loss: 6.366038E+00 | loss scale: 32768.0 | grad norm: 230305.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4468/ 159576 | consumed samples: 96336 | elapsed time per iteration (ms): 15456.1 | learning rate: 2.668E-05 | global batch size: 48 | lm loss: 6.383101E+00 | loss scale: 32768.0 | grad norm: 266194.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4469/ 159576 | consumed samples: 96384 | elapsed time per iteration (ms): 15450.4 | learning rate: 2.669E-05 | global batch size: 48 | lm loss: 6.383107E+00 | loss scale: 32768.0 | grad norm: 224990.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4470/ 159576 | consumed samples: 96432 | elapsed time per iteration (ms): 15624.0 | learning rate: 2.670E-05 | global batch size: 48 | lm loss: 6.393697E+00 | loss scale: 32768.0 | grad norm: 301446.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4471/ 159576 | consumed samples: 96480 | elapsed time per iteration (ms): 15530.2 | learning rate: 2.672E-05 | global batch size: 48 | lm loss: 6.364079E+00 | loss scale: 32768.0 | grad norm: 215922.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4472/ 159576 | consumed samples: 96528 | elapsed time per iteration (ms): 15512.2 | learning rate: 2.673E-05 | global batch size: 48 | lm loss: 6.373242E+00 | loss scale: 32768.0 | grad norm: 297810.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4473/ 159576 | consumed samples: 96576 | elapsed time per iteration (ms): 15493.5 | learning rate: 2.674E-05 | global batch size: 48 | lm loss: 6.458824E+00 | loss scale: 32768.0 | grad norm: 253875.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4474/ 159576 | consumed samples: 96624 | elapsed time per iteration (ms): 16109.8 | learning rate: 2.676E-05 | global batch size: 48 | lm loss: 6.444027E+00 | loss scale: 32768.0 | grad norm: 235767.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4475/ 159576 | consumed samples: 96672 | elapsed time per iteration (ms): 15442.4 | learning rate: 2.677E-05 | global batch size: 48 | lm loss: 6.379702E+00 | loss scale: 32768.0 | grad norm: 200816.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4476/ 159576 | consumed samples: 96720 | elapsed time per iteration (ms): 15439.1 | learning rate: 2.678E-05 | global batch size: 48 | lm loss: 6.460698E+00 | loss scale: 32768.0 | grad norm: 243887.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4477/ 159576 | consumed samples: 96768 | elapsed time per iteration (ms): 15842.8 | learning rate: 2.680E-05 | global batch size: 48 | lm loss: 6.425824E+00 | loss scale: 32768.0 | grad norm: 194209.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4478/ 159576 | consumed samples: 96816 | elapsed time per iteration (ms): 15527.8 | learning rate: 2.681E-05 | global batch size: 48 | lm loss: 6.499928E+00 | loss scale: 32768.0 | grad norm: 205164.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4479/ 159576 | consumed samples: 96864 | elapsed time per iteration (ms): 15497.3 | learning rate: 2.682E-05 | global batch size: 48 | lm loss: 6.333491E+00 | loss scale: 32768.0 | grad norm: 198136.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4480/ 159576 | consumed samples: 96912 | elapsed time per iteration (ms): 15608.5 | learning rate: 2.684E-05 | global batch size: 48 | lm loss: 6.393649E+00 | loss scale: 32768.0 | grad norm: 226765.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4481/ 159576 | consumed samples: 96960 | elapsed time per iteration (ms): 15886.4 | learning rate: 2.685E-05 | global batch size: 48 | lm loss: 6.315465E+00 | loss scale: 32768.0 | grad norm: 233990.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4482/ 159576 | consumed samples: 97008 | elapsed time per iteration (ms): 15388.4 | learning rate: 2.686E-05 | global batch size: 48 | lm loss: 6.467194E+00 | loss scale: 32768.0 | grad norm: 253595.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4483/ 159576 | consumed samples: 97056 | elapsed time per iteration (ms): 15452.6 | learning rate: 2.688E-05 | global batch size: 48 | lm loss: 6.424766E+00 | loss scale: 32768.0 | grad norm: 243792.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4484/ 159576 | consumed samples: 97104 | elapsed time per iteration (ms): 15440.8 | learning rate: 2.689E-05 | global batch size: 48 | lm loss: 6.382202E+00 | loss scale: 32768.0 | grad norm: 253619.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4485/ 159576 | consumed samples: 97152 | elapsed time per iteration (ms): 15758.4 | learning rate: 2.690E-05 | global batch size: 48 | lm loss: 6.420368E+00 | loss scale: 32768.0 | grad norm: 270122.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4486/ 159576 | consumed samples: 97200 | elapsed time per iteration (ms): 15504.2 | learning rate: 2.692E-05 | global batch size: 48 | lm loss: 6.341059E+00 | loss scale: 32768.0 | grad norm: 264076.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4487/ 159576 | consumed samples: 97248 | elapsed time per iteration (ms): 15564.4 | learning rate: 2.693E-05 | global batch size: 48 | lm loss: 6.351835E+00 | loss scale: 32768.0 | grad norm: 254803.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4488/ 159576 | consumed samples: 97296 | elapsed time per iteration (ms): 15603.6 | learning rate: 2.694E-05 | global batch size: 48 | lm loss: 6.344017E+00 | loss scale: 32768.0 | grad norm: 244790.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4489/ 159576 | consumed samples: 97344 | elapsed time per iteration (ms): 15804.2 | learning rate: 2.696E-05 | global batch size: 48 | lm loss: 6.487484E+00 | loss scale: 32768.0 | grad norm: 242539.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4490/ 159576 | consumed samples: 97392 | elapsed time per iteration (ms): 15547.3 | learning rate: 2.697E-05 | global batch size: 48 | lm loss: 6.339984E+00 | loss scale: 32768.0 | grad norm: 225575.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4491/ 159576 | consumed samples: 97440 | elapsed time per iteration (ms): 15475.7 | learning rate: 2.698E-05 | global batch size: 48 | lm loss: 6.449341E+00 | loss scale: 32768.0 | grad norm: 205395.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4492/ 159576 | consumed samples: 97488 | elapsed time per iteration (ms): 15436.0 | learning rate: 2.700E-05 | global batch size: 48 | lm loss: 6.382250E+00 | loss scale: 32768.0 | grad norm: 234078.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4493/ 159576 | consumed samples: 97536 | elapsed time per iteration (ms): 15764.8 | learning rate: 2.701E-05 | global batch size: 48 | lm loss: 6.425200E+00 | loss scale: 32768.0 | grad norm: 247476.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4494/ 159576 | consumed samples: 97584 | elapsed time per iteration (ms): 15532.5 | learning rate: 2.702E-05 | global batch size: 48 | lm loss: 6.381852E+00 | loss scale: 32768.0 | grad norm: 242648.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4495/ 159576 | consumed samples: 97632 | elapsed time per iteration (ms): 15533.1 | learning rate: 2.704E-05 | global batch size: 48 | lm loss: 6.230868E+00 | loss scale: 32768.0 | grad norm: 219731.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4496/ 159576 | consumed samples: 97680 | elapsed time per iteration (ms): 15535.3 | learning rate: 2.705E-05 | global batch size: 48 | lm loss: 6.353293E+00 | loss scale: 32768.0 | grad norm: 216013.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4497/ 159576 | consumed samples: 97728 | elapsed time per iteration (ms): 15701.4 | learning rate: 2.706E-05 | global batch size: 48 | lm loss: 6.307485E+00 | loss scale: 32768.0 | grad norm: 253204.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4498/ 159576 | consumed samples: 97776 | elapsed time per iteration (ms): 15494.7 | learning rate: 2.708E-05 | global batch size: 48 | lm loss: 6.438371E+00 | loss scale: 32768.0 | grad norm: 214787.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4499/ 159576 | consumed samples: 97824 | elapsed time per iteration (ms): 15439.3 | learning rate: 2.709E-05 | global batch size: 48 | lm loss: 6.311467E+00 | loss scale: 32768.0 | grad norm: 216077.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4500/ 159576 | consumed samples: 97872 | elapsed time per iteration (ms): 15685.9 | learning rate: 2.710E-05 | global batch size: 48 | lm loss: 6.423208E+00 | loss scale: 32768.0 | grad norm: 207994.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 4500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-24 19:52:27,219] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step4500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 4500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17750.13 iteration 4501/ 159576 | consumed samples: 97920 | elapsed time per iteration (ms): 33282.4 | learning rate: 2.712E-05 | global batch size: 48 | lm loss: 6.402827E+00 | loss scale: 32768.0 | grad norm: 259915.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4502/ 159576 | consumed samples: 97968 | elapsed time per iteration (ms): 15581.1 | learning rate: 2.713E-05 | global batch size: 48 | lm loss: 6.310410E+00 | loss scale: 32768.0 | grad norm: 222384.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4503/ 159576 | consumed samples: 98016 | elapsed time per iteration (ms): 15856.7 | learning rate: 2.714E-05 | global batch size: 48 | lm loss: 6.259107E+00 | loss scale: 32768.0 | grad norm: 219981.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4504/ 159576 | consumed samples: 98064 | elapsed time per iteration (ms): 15522.8 | learning rate: 2.716E-05 | global batch size: 48 | lm loss: 6.441791E+00 | loss scale: 32768.0 | grad norm: 235487.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4505/ 159576 | consumed samples: 98112 | elapsed time per iteration (ms): 15475.3 | learning rate: 2.717E-05 | global batch size: 48 | lm loss: 6.431644E+00 | loss scale: 32768.0 | grad norm: 308152.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4506/ 159576 | consumed samples: 98160 | elapsed time per iteration (ms): 15475.2 | learning rate: 2.718E-05 | global batch size: 48 | lm loss: 6.437158E+00 | loss scale: 32768.0 | grad norm: 223087.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4507/ 159576 | consumed samples: 98208 | elapsed time per iteration (ms): 15919.3 | learning rate: 2.720E-05 | global batch size: 48 | lm loss: 6.456445E+00 | loss scale: 32768.0 | grad norm: 223422.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4508/ 159576 | consumed samples: 98256 | elapsed time per iteration (ms): 15503.1 | learning rate: 2.721E-05 | global batch size: 48 | lm loss: 6.409997E+00 | loss scale: 32768.0 | grad norm: 245785.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4509/ 159576 | consumed samples: 98304 | elapsed time per iteration (ms): 15512.1 | learning rate: 2.722E-05 | global batch size: 48 | lm loss: 6.441339E+00 | loss scale: 32768.0 | grad norm: 283619.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4510/ 159576 | consumed samples: 98352 | elapsed time per iteration (ms): 15548.0 | learning rate: 2.724E-05 | global batch size: 48 | lm loss: 6.441983E+00 | loss scale: 32768.0 | grad norm: 235037.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4511/ 159576 | consumed samples: 98400 | elapsed time per iteration (ms): 15735.6 | learning rate: 2.725E-05 | global batch size: 48 | lm loss: 6.499406E+00 | loss scale: 32768.0 | grad norm: 238925.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4512/ 159576 | consumed samples: 98448 | elapsed time per iteration (ms): 15495.6 | learning rate: 2.726E-05 | global batch size: 48 | lm loss: 6.429494E+00 | loss scale: 32768.0 | grad norm: 295604.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4513/ 159576 | consumed samples: 98496 | elapsed time per iteration (ms): 15481.9 | learning rate: 2.728E-05 | global batch size: 48 | lm loss: 6.407839E+00 | loss scale: 32768.0 | grad norm: 292842.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4514/ 159576 | consumed samples: 98544 | elapsed time per iteration (ms): 15479.3 | learning rate: 2.729E-05 | global batch size: 48 | lm loss: 6.440022E+00 | loss scale: 32768.0 | grad norm: 270315.805 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4515/ 159576 | consumed samples: 98592 | elapsed time per iteration (ms): 15606.8 | learning rate: 2.730E-05 | global batch size: 48 | lm loss: 6.391658E+00 | loss scale: 32768.0 | grad norm: 271519.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4516/ 159576 | consumed samples: 98640 | elapsed time per iteration (ms): 15492.8 | learning rate: 2.732E-05 | global batch size: 48 | lm loss: 6.445361E+00 | loss scale: 32768.0 | grad norm: 235853.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4517/ 159576 | consumed samples: 98688 | elapsed time per iteration (ms): 15525.5 | learning rate: 2.733E-05 | global batch size: 48 | lm loss: 6.274318E+00 | loss scale: 32768.0 | grad norm: 246250.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4518/ 159576 | consumed samples: 98736 | elapsed time per iteration (ms): 15595.2 | learning rate: 2.734E-05 | global batch size: 48 | lm loss: 6.378585E+00 | loss scale: 32768.0 | grad norm: 262163.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4519/ 159576 | consumed samples: 98784 | elapsed time per iteration (ms): 15657.4 | learning rate: 2.736E-05 | global batch size: 48 | lm loss: 6.398365E+00 | loss scale: 32768.0 | grad norm: 339087.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4520/ 159576 | consumed samples: 98832 | elapsed time per iteration (ms): 15503.5 | learning rate: 2.737E-05 | global batch size: 48 | lm loss: 6.435692E+00 | loss scale: 32768.0 | grad norm: 219944.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4521/ 159576 | consumed samples: 98880 | elapsed time per iteration (ms): 15444.3 | learning rate: 2.738E-05 | global batch size: 48 | lm loss: 6.418158E+00 | loss scale: 32768.0 | grad norm: 295809.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4522/ 159576 | consumed samples: 98928 | elapsed time per iteration (ms): 15726.5 | learning rate: 2.739E-05 | global batch size: 48 | lm loss: 6.317287E+00 | loss scale: 32768.0 | grad norm: 256139.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4523/ 159576 | consumed samples: 98976 | elapsed time per iteration (ms): 15697.5 | learning rate: 2.741E-05 | global batch size: 48 | lm loss: 6.210083E+00 | loss scale: 32768.0 | grad norm: 222390.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4524/ 159576 | consumed samples: 99024 | elapsed time per iteration (ms): 15483.9 | learning rate: 2.742E-05 | global batch size: 48 | lm loss: 6.357608E+00 | loss scale: 32768.0 | grad norm: 250631.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4525/ 159576 | consumed samples: 99072 | elapsed time per iteration (ms): 15498.9 | learning rate: 2.743E-05 | global batch size: 48 | lm loss: 6.439158E+00 | loss scale: 32768.0 | grad norm: 237183.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4526/ 159576 | consumed samples: 99120 | elapsed time per iteration (ms): 15870.3 | learning rate: 2.745E-05 | global batch size: 48 | lm loss: 6.477302E+00 | loss scale: 32768.0 | grad norm: 234590.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4527/ 159576 | consumed samples: 99168 | elapsed time per iteration (ms): 15527.5 | learning rate: 2.746E-05 | global batch size: 48 | lm loss: 6.404512E+00 | loss scale: 32768.0 | grad norm: 268737.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4528/ 159576 | consumed samples: 99216 | elapsed time per iteration (ms): 15477.7 | learning rate: 2.747E-05 | global batch size: 48 | lm loss: 6.357052E+00 | loss scale: 32768.0 | grad norm: 199055.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4529/ 159576 | consumed samples: 99264 | elapsed time per iteration (ms): 15441.0 | learning rate: 2.749E-05 | global batch size: 48 | lm loss: 6.418729E+00 | loss scale: 32768.0 | grad norm: 280337.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4530/ 159576 | consumed samples: 99312 | elapsed time per iteration (ms): 15870.6 | learning rate: 2.750E-05 | global batch size: 48 | lm loss: 6.394526E+00 | loss scale: 32768.0 | grad norm: 242159.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4531/ 159576 | consumed samples: 99360 | elapsed time per iteration (ms): 15356.1 | learning rate: 2.751E-05 | global batch size: 48 | lm loss: 6.454551E+00 | loss scale: 32768.0 | grad norm: 238356.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4532/ 159576 | consumed samples: 99408 | elapsed time per iteration (ms): 15481.2 | learning rate: 2.753E-05 | global batch size: 48 | lm loss: 6.479828E+00 | loss scale: 32768.0 | grad norm: 256781.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4533/ 159576 | consumed samples: 99456 | elapsed time per iteration (ms): 15512.7 | learning rate: 2.754E-05 | global batch size: 48 | lm loss: 6.347847E+00 | loss scale: 32768.0 | grad norm: 232593.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4534/ 159576 | consumed samples: 99504 | elapsed time per iteration (ms): 16020.6 | learning rate: 2.755E-05 | global batch size: 48 | lm loss: 6.361287E+00 | loss scale: 32768.0 | grad norm: 214859.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4535/ 159576 | consumed samples: 99552 | elapsed time per iteration (ms): 15687.2 | learning rate: 2.757E-05 | global batch size: 48 | lm loss: 6.344873E+00 | loss scale: 32768.0 | grad norm: 214653.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4536/ 159576 | consumed samples: 99600 | elapsed time per iteration (ms): 15424.3 | learning rate: 2.758E-05 | global batch size: 48 | lm loss: 6.273855E+00 | loss scale: 32768.0 | grad norm: 249309.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4537/ 159576 | consumed samples: 99648 | elapsed time per iteration (ms): 15440.3 | learning rate: 2.759E-05 | global batch size: 48 | lm loss: 6.373835E+00 | loss scale: 32768.0 | grad norm: 230963.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4538/ 159576 | consumed samples: 99696 | elapsed time per iteration (ms): 15788.5 | learning rate: 2.761E-05 | global batch size: 48 | lm loss: 6.381639E+00 | loss scale: 32768.0 | grad norm: 258586.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4539/ 159576 | consumed samples: 99744 | elapsed time per iteration (ms): 15436.7 | learning rate: 2.762E-05 | global batch size: 48 | lm loss: 6.464207E+00 | loss scale: 32768.0 | grad norm: 260715.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4540/ 159576 | consumed samples: 99792 | elapsed time per iteration (ms): 15631.9 | learning rate: 2.763E-05 | global batch size: 48 | lm loss: 6.282461E+00 | loss scale: 32768.0 | grad norm: 271394.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4541/ 159576 | consumed samples: 99840 | elapsed time per iteration (ms): 15417.1 | learning rate: 2.765E-05 | global batch size: 48 | lm loss: 6.323977E+00 | loss scale: 32768.0 | grad norm: 268740.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4542/ 159576 | consumed samples: 99888 | elapsed time per iteration (ms): 15726.7 | learning rate: 2.766E-05 | global batch size: 48 | lm loss: 6.419955E+00 | loss scale: 32768.0 | grad norm: 270171.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4543/ 159576 | consumed samples: 99936 | elapsed time per iteration (ms): 15524.6 | learning rate: 2.767E-05 | global batch size: 48 | lm loss: 6.456992E+00 | loss scale: 32768.0 | grad norm: 255182.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4544/ 159576 | consumed samples: 99984 | elapsed time per iteration (ms): 15442.0 | learning rate: 2.769E-05 | global batch size: 48 | lm loss: 6.327838E+00 | loss scale: 32768.0 | grad norm: 224129.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4545/ 159576 | consumed samples: 100032 | elapsed time per iteration (ms): 15419.1 | learning rate: 2.770E-05 | global batch size: 48 | lm loss: 6.374109E+00 | loss scale: 32768.0 | grad norm: 265872.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4546/ 159576 | consumed samples: 100080 | elapsed time per iteration (ms): 15626.3 | learning rate: 2.771E-05 | global batch size: 48 | lm loss: 6.332025E+00 | loss scale: 32768.0 | grad norm: 221965.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4547/ 159576 | consumed samples: 100128 | elapsed time per iteration (ms): 15454.8 | learning rate: 2.773E-05 | global batch size: 48 | lm loss: 6.399364E+00 | loss scale: 32768.0 | grad norm: 257839.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4548/ 159576 | consumed samples: 100176 | elapsed time per iteration (ms): 15431.4 | learning rate: 2.774E-05 | global batch size: 48 | lm loss: 6.411947E+00 | loss scale: 32768.0 | grad norm: 278135.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4549/ 159576 | consumed samples: 100224 | elapsed time per iteration (ms): 15844.6 | learning rate: 2.775E-05 | global batch size: 48 | lm loss: 6.477700E+00 | loss scale: 32768.0 | grad norm: 277855.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4550/ 159576 | consumed samples: 100272 | elapsed time per iteration (ms): 15537.3 | learning rate: 2.777E-05 | global batch size: 48 | lm loss: 6.526390E+00 | loss scale: 32768.0 | grad norm: 246063.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4551/ 159576 | consumed samples: 100320 | elapsed time per iteration (ms): 15431.5 | learning rate: 2.778E-05 | global batch size: 48 | lm loss: 6.391055E+00 | loss scale: 32768.0 | grad norm: 230174.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4552/ 159576 | consumed samples: 100368 | elapsed time per iteration (ms): 15392.1 | learning rate: 2.779E-05 | global batch size: 48 | lm loss: 6.381279E+00 | loss scale: 32768.0 | grad norm: 230427.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4553/ 159576 | consumed samples: 100416 | elapsed time per iteration (ms): 15770.9 | learning rate: 2.781E-05 | global batch size: 48 | lm loss: 6.438869E+00 | loss scale: 32768.0 | grad norm: 230488.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4554/ 159576 | consumed samples: 100464 | elapsed time per iteration (ms): 15447.7 | learning rate: 2.782E-05 | global batch size: 48 | lm loss: 6.245214E+00 | loss scale: 32768.0 | grad norm: 277295.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4555/ 159576 | consumed samples: 100512 | elapsed time per iteration (ms): 15446.9 | learning rate: 2.783E-05 | global batch size: 48 | lm loss: 6.413427E+00 | loss scale: 32768.0 | grad norm: 223183.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 20:07:07] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 20:07:07] PULSE: tr8-104B is running for 14:14:56 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 4556/ 159576 | consumed samples: 100560 | elapsed time per iteration (ms): 15400.2 | learning rate: 2.785E-05 | global batch size: 48 | lm loss: 6.398170E+00 | loss scale: 32768.0 | grad norm: 233778.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4557/ 159576 | consumed samples: 100608 | elapsed time per iteration (ms): 15788.3 | learning rate: 2.786E-05 | global batch size: 48 | lm loss: 6.417650E+00 | loss scale: 32768.0 | grad norm: 311870.109 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4558/ 159576 | consumed samples: 100656 | elapsed time per iteration (ms): 15428.6 | learning rate: 2.787E-05 | global batch size: 48 | lm loss: 6.394480E+00 | loss scale: 32768.0 | grad norm: 234331.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4559/ 159576 | consumed samples: 100704 | elapsed time per iteration (ms): 15432.2 | learning rate: 2.789E-05 | global batch size: 48 | lm loss: 6.379920E+00 | loss scale: 32768.0 | grad norm: 256774.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4560/ 159576 | consumed samples: 100752 | elapsed time per iteration (ms): 15427.3 | learning rate: 2.790E-05 | global batch size: 48 | lm loss: 6.398593E+00 | loss scale: 32768.0 | grad norm: 244274.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4561/ 159576 | consumed samples: 100800 | elapsed time per iteration (ms): 15906.6 | learning rate: 2.791E-05 | global batch size: 48 | lm loss: 6.370606E+00 | loss scale: 32768.0 | grad norm: 239881.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4562/ 159576 | consumed samples: 100848 | elapsed time per iteration (ms): 15436.7 | learning rate: 2.793E-05 | global batch size: 48 | lm loss: 6.449897E+00 | loss scale: 32768.0 | grad norm: 244189.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4563/ 159576 | consumed samples: 100896 | elapsed time per iteration (ms): 15423.9 | learning rate: 2.794E-05 | global batch size: 48 | lm loss: 6.361297E+00 | loss scale: 32768.0 | grad norm: 214769.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4564/ 159576 | consumed samples: 100944 | elapsed time per iteration (ms): 15485.4 | learning rate: 2.795E-05 | global batch size: 48 | lm loss: 6.315623E+00 | loss scale: 32768.0 | grad norm: 238075.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4565/ 159576 | consumed samples: 100992 | elapsed time per iteration (ms): 15712.7 | learning rate: 2.797E-05 | global batch size: 48 | lm loss: 6.407779E+00 | loss scale: 32768.0 | grad norm: 219946.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4566/ 159576 | consumed samples: 101040 | elapsed time per iteration (ms): 15450.4 | learning rate: 2.798E-05 | global batch size: 48 | lm loss: 6.417436E+00 | loss scale: 32768.0 | grad norm: 240930.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4567/ 159576 | consumed samples: 101088 | elapsed time per iteration (ms): 15429.7 | learning rate: 2.799E-05 | global batch size: 48 | lm loss: 6.436010E+00 | loss scale: 32768.0 | grad norm: 314077.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4568/ 159576 | consumed samples: 101136 | elapsed time per iteration (ms): 15422.9 | learning rate: 2.801E-05 | global batch size: 48 | lm loss: 6.520737E+00 | loss scale: 32768.0 | grad norm: 274297.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4569/ 159576 | consumed samples: 101184 | elapsed time per iteration (ms): 15586.4 | learning rate: 2.802E-05 | global batch size: 48 | lm loss: 6.416994E+00 | loss scale: 32768.0 | grad norm: 231703.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4570/ 159576 | consumed samples: 101232 | elapsed time per iteration (ms): 15422.0 | learning rate: 2.803E-05 | global batch size: 48 | lm loss: 6.319811E+00 | loss scale: 32768.0 | grad norm: 231530.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4571/ 159576 | consumed samples: 101280 | elapsed time per iteration (ms): 15338.3 | learning rate: 2.805E-05 | global batch size: 48 | lm loss: 6.400026E+00 | loss scale: 32768.0 | grad norm: 257733.850 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4572/ 159576 | consumed samples: 101328 | elapsed time per iteration (ms): 15446.6 | learning rate: 2.806E-05 | global batch size: 48 | lm loss: 6.435762E+00 | loss scale: 32768.0 | grad norm: 268511.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4573/ 159576 | consumed samples: 101376 | elapsed time per iteration (ms): 15589.8 | learning rate: 2.807E-05 | global batch size: 48 | lm loss: 6.406414E+00 | loss scale: 32768.0 | grad norm: 233768.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4574/ 159576 | consumed samples: 101424 | elapsed time per iteration (ms): 15349.3 | learning rate: 2.809E-05 | global batch size: 48 | lm loss: 6.437346E+00 | loss scale: 32768.0 | grad norm: 269214.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4575/ 159576 | consumed samples: 101472 | elapsed time per iteration (ms): 15388.4 | learning rate: 2.810E-05 | global batch size: 48 | lm loss: 6.352981E+00 | loss scale: 32768.0 | grad norm: 243418.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4576/ 159576 | consumed samples: 101520 | elapsed time per iteration (ms): 15469.0 | learning rate: 2.811E-05 | global batch size: 48 | lm loss: 6.355519E+00 | loss scale: 32768.0 | grad norm: 255521.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4577/ 159576 | consumed samples: 101568 | elapsed time per iteration (ms): 15986.1 | learning rate: 2.813E-05 | global batch size: 48 | lm loss: 6.380365E+00 | loss scale: 32768.0 | grad norm: 263123.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4578/ 159576 | consumed samples: 101616 | elapsed time per iteration (ms): 15483.5 | learning rate: 2.814E-05 | global batch size: 48 | lm loss: 6.442792E+00 | loss scale: 32768.0 | grad norm: 264664.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4579/ 159576 | consumed samples: 101664 | elapsed time per iteration (ms): 15482.0 | learning rate: 2.815E-05 | global batch size: 48 | lm loss: 6.300795E+00 | loss scale: 32768.0 | grad norm: 263093.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4580/ 159576 | consumed samples: 101712 | elapsed time per iteration (ms): 15915.5 | learning rate: 2.817E-05 | global batch size: 48 | lm loss: 6.509340E+00 | loss scale: 32768.0 | grad norm: 325066.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4581/ 159576 | consumed samples: 101760 | elapsed time per iteration (ms): 15478.8 | learning rate: 2.818E-05 | global batch size: 48 | lm loss: 6.417569E+00 | loss scale: 32768.0 | grad norm: 317932.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4582/ 159576 | consumed samples: 101808 | elapsed time per iteration (ms): 15467.6 | learning rate: 2.819E-05 | global batch size: 48 | lm loss: 6.391977E+00 | loss scale: 32768.0 | grad norm: 265433.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4583/ 159576 | consumed samples: 101856 | elapsed time per iteration (ms): 15463.2 | learning rate: 2.821E-05 | global batch size: 48 | lm loss: 6.493138E+00 | loss scale: 32768.0 | grad norm: 262301.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4584/ 159576 | consumed samples: 101904 | elapsed time per iteration (ms): 15787.5 | learning rate: 2.822E-05 | global batch size: 48 | lm loss: 6.358137E+00 | loss scale: 32768.0 | grad norm: 302003.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4585/ 159576 | consumed samples: 101952 | elapsed time per iteration (ms): 15486.8 | learning rate: 2.823E-05 | global batch size: 48 | lm loss: 6.398649E+00 | loss scale: 32768.0 | grad norm: 241427.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4586/ 159576 | consumed samples: 102000 | elapsed time per iteration (ms): 15502.1 | learning rate: 2.825E-05 | global batch size: 48 | lm loss: 6.450002E+00 | loss scale: 32768.0 | grad norm: 288231.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4587/ 159576 | consumed samples: 102048 | elapsed time per iteration (ms): 15613.4 | learning rate: 2.826E-05 | global batch size: 48 | lm loss: 6.463566E+00 | loss scale: 32768.0 | grad norm: 255700.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4588/ 159576 | consumed samples: 102096 | elapsed time per iteration (ms): 16100.7 | learning rate: 2.827E-05 | global batch size: 48 | lm loss: 6.440113E+00 | loss scale: 32768.0 | grad norm: 228589.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4589/ 159576 | consumed samples: 102144 | elapsed time per iteration (ms): 15550.6 | learning rate: 2.829E-05 | global batch size: 48 | lm loss: 6.330764E+00 | loss scale: 32768.0 | grad norm: 253562.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4590/ 159576 | consumed samples: 102192 | elapsed time per iteration (ms): 15504.0 | learning rate: 2.830E-05 | global batch size: 48 | lm loss: 6.565317E+00 | loss scale: 32768.0 | grad norm: 248109.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4591/ 159576 | consumed samples: 102240 | elapsed time per iteration (ms): 15500.8 | learning rate: 2.831E-05 | global batch size: 48 | lm loss: 6.432470E+00 | loss scale: 32768.0 | grad norm: 258408.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4592/ 159576 | consumed samples: 102288 | elapsed time per iteration (ms): 15682.0 | learning rate: 2.833E-05 | global batch size: 48 | lm loss: 6.388723E+00 | loss scale: 32768.0 | grad norm: 255460.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4593/ 159576 | consumed samples: 102336 | elapsed time per iteration (ms): 15624.8 | learning rate: 2.834E-05 | global batch size: 48 | lm loss: 6.252523E+00 | loss scale: 32768.0 | grad norm: 247063.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4594/ 159576 | consumed samples: 102384 | elapsed time per iteration (ms): 15619.9 | learning rate: 2.835E-05 | global batch size: 48 | lm loss: 6.256584E+00 | loss scale: 32768.0 | grad norm: 252094.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4595/ 159576 | consumed samples: 102432 | elapsed time per iteration (ms): 15618.3 | learning rate: 2.837E-05 | global batch size: 48 | lm loss: 6.422144E+00 | loss scale: 32768.0 | grad norm: 327415.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4596/ 159576 | consumed samples: 102480 | elapsed time per iteration (ms): 15731.1 | learning rate: 2.838E-05 | global batch size: 48 | lm loss: 6.362859E+00 | loss scale: 32768.0 | grad norm: 271628.783 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4597/ 159576 | consumed samples: 102528 | elapsed time per iteration (ms): 15470.5 | learning rate: 2.839E-05 | global batch size: 48 | lm loss: 6.400634E+00 | loss scale: 32768.0 | grad norm: 270235.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4598/ 159576 | consumed samples: 102576 | elapsed time per iteration (ms): 15494.8 | learning rate: 2.841E-05 | global batch size: 48 | lm loss: 6.409593E+00 | loss scale: 32768.0 | grad norm: 246051.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4599/ 159576 | consumed samples: 102624 | elapsed time per iteration (ms): 15503.4 | learning rate: 2.842E-05 | global batch size: 48 | lm loss: 6.286301E+00 | loss scale: 32768.0 | grad norm: 315951.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4600/ 159576 | consumed samples: 102672 | elapsed time per iteration (ms): 15657.8 | learning rate: 2.843E-05 | global batch size: 48 | lm loss: 6.424391E+00 | loss scale: 32768.0 | grad norm: 257970.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4601/ 159576 | consumed samples: 102720 | elapsed time per iteration (ms): 15415.9 | learning rate: 2.845E-05 | global batch size: 48 | lm loss: 6.419086E+00 | loss scale: 32768.0 | grad norm: 232614.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4602/ 159576 | consumed samples: 102768 | elapsed time per iteration (ms): 15506.4 | learning rate: 2.846E-05 | global batch size: 48 | lm loss: 6.598701E+00 | loss scale: 32768.0 | grad norm: 269465.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4603/ 159576 | consumed samples: 102816 | elapsed time per iteration (ms): 15842.0 | learning rate: 2.847E-05 | global batch size: 48 | lm loss: 6.374152E+00 | loss scale: 32768.0 | grad norm: 256871.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4604/ 159576 | consumed samples: 102864 | elapsed time per iteration (ms): 15661.0 | learning rate: 2.849E-05 | global batch size: 48 | lm loss: 6.330672E+00 | loss scale: 32768.0 | grad norm: 261276.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4605/ 159576 | consumed samples: 102912 | elapsed time per iteration (ms): 15453.1 | learning rate: 2.850E-05 | global batch size: 48 | lm loss: 6.409989E+00 | loss scale: 32768.0 | grad norm: 213427.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4606/ 159576 | consumed samples: 102960 | elapsed time per iteration (ms): 15529.1 | learning rate: 2.851E-05 | global batch size: 48 | lm loss: 6.409967E+00 | loss scale: 32768.0 | grad norm: 343079.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4607/ 159576 | consumed samples: 103008 | elapsed time per iteration (ms): 15784.9 | learning rate: 2.853E-05 | global batch size: 48 | lm loss: 6.345381E+00 | loss scale: 32768.0 | grad norm: 288014.524 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4608/ 159576 | consumed samples: 103056 | elapsed time per iteration (ms): 15407.4 | learning rate: 2.854E-05 | global batch size: 48 | lm loss: 6.160167E+00 | loss scale: 32768.0 | grad norm: 236948.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4609/ 159576 | consumed samples: 103104 | elapsed time per iteration (ms): 15521.9 | learning rate: 2.855E-05 | global batch size: 48 | lm loss: 6.368454E+00 | loss scale: 32768.0 | grad norm: 346716.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4610/ 159576 | consumed samples: 103152 | elapsed time per iteration (ms): 15546.6 | learning rate: 2.857E-05 | global batch size: 48 | lm loss: 6.485950E+00 | loss scale: 32768.0 | grad norm: 249193.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4611/ 159576 | consumed samples: 103200 | elapsed time per iteration (ms): 15842.5 | learning rate: 2.858E-05 | global batch size: 48 | lm loss: 6.433112E+00 | loss scale: 32768.0 | grad norm: 245691.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4612/ 159576 | consumed samples: 103248 | elapsed time per iteration (ms): 15452.2 | learning rate: 2.859E-05 | global batch size: 48 | lm loss: 6.453573E+00 | loss scale: 32768.0 | grad norm: 326844.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4613/ 159576 | consumed samples: 103296 | elapsed time per iteration (ms): 15454.7 | learning rate: 2.861E-05 | global batch size: 48 | lm loss: 6.431165E+00 | loss scale: 32768.0 | grad norm: 289334.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4614/ 159576 | consumed samples: 103344 | elapsed time per iteration (ms): 15458.5 | learning rate: 2.862E-05 | global batch size: 48 | lm loss: 6.229577E+00 | loss scale: 32768.0 | grad norm: 256574.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4615/ 159576 | consumed samples: 103392 | elapsed time per iteration (ms): 15900.6 | learning rate: 2.863E-05 | global batch size: 48 | lm loss: 6.432065E+00 | loss scale: 32768.0 | grad norm: 273324.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4616/ 159576 | consumed samples: 103440 | elapsed time per iteration (ms): 15568.2 | learning rate: 2.865E-05 | global batch size: 48 | lm loss: 6.373868E+00 | loss scale: 32768.0 | grad norm: 289471.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4617/ 159576 | consumed samples: 103488 | elapsed time per iteration (ms): 15491.7 | learning rate: 2.866E-05 | global batch size: 48 | lm loss: 6.302549E+00 | loss scale: 32768.0 | grad norm: 421148.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4618/ 159576 | consumed samples: 103536 | elapsed time per iteration (ms): 15549.9 | learning rate: 2.867E-05 | global batch size: 48 | lm loss: 6.278319E+00 | loss scale: 32768.0 | grad norm: 346570.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4619/ 159576 | consumed samples: 103584 | elapsed time per iteration (ms): 15749.4 | learning rate: 2.869E-05 | global batch size: 48 | lm loss: 6.394638E+00 | loss scale: 32768.0 | grad norm: 356110.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4620/ 159576 | consumed samples: 103632 | elapsed time per iteration (ms): 15472.2 | learning rate: 2.870E-05 | global batch size: 48 | lm loss: 6.303448E+00 | loss scale: 32768.0 | grad norm: 328724.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4621/ 159576 | consumed samples: 103680 | elapsed time per iteration (ms): 15427.3 | learning rate: 2.871E-05 | global batch size: 48 | lm loss: 6.544609E+00 | loss scale: 32768.0 | grad norm: 324100.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4622/ 159576 | consumed samples: 103728 | elapsed time per iteration (ms): 15472.5 | learning rate: 2.873E-05 | global batch size: 48 | lm loss: 6.314513E+00 | loss scale: 32768.0 | grad norm: 275878.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4623/ 159576 | consumed samples: 103776 | elapsed time per iteration (ms): 15583.2 | learning rate: 2.874E-05 | global batch size: 48 | lm loss: 6.398262E+00 | loss scale: 32768.0 | grad norm: 263126.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4624/ 159576 | consumed samples: 103824 | elapsed time per iteration (ms): 15483.7 | learning rate: 2.875E-05 | global batch size: 48 | lm loss: 6.474843E+00 | loss scale: 32768.0 | grad norm: 242329.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4625/ 159576 | consumed samples: 103872 | elapsed time per iteration (ms): 15477.6 | learning rate: 2.877E-05 | global batch size: 48 | lm loss: 6.408014E+00 | loss scale: 32768.0 | grad norm: 267696.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4626/ 159576 | consumed samples: 103920 | elapsed time per iteration (ms): 15516.2 | learning rate: 2.878E-05 | global batch size: 48 | lm loss: 6.847461E+00 | loss scale: 32768.0 | grad norm: 713094.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4627/ 159576 | consumed samples: 103968 | elapsed time per iteration (ms): 15724.2 | learning rate: 2.879E-05 | global batch size: 48 | lm loss: 6.386415E+00 | loss scale: 32768.0 | grad norm: 272846.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4628/ 159576 | consumed samples: 104016 | elapsed time per iteration (ms): 15456.1 | learning rate: 2.881E-05 | global batch size: 48 | lm loss: 6.446278E+00 | loss scale: 32768.0 | grad norm: 379795.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4629/ 159576 | consumed samples: 104064 | elapsed time per iteration (ms): 15435.5 | learning rate: 2.882E-05 | global batch size: 48 | lm loss: 6.469239E+00 | loss scale: 32768.0 | grad norm: 207715.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4630/ 159576 | consumed samples: 104112 | elapsed time per iteration (ms): 15698.1 | learning rate: 2.883E-05 | global batch size: 48 | lm loss: 6.357453E+00 | loss scale: 32768.0 | grad norm: 236792.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4631/ 159576 | consumed samples: 104160 | elapsed time per iteration (ms): 15489.5 | learning rate: 2.885E-05 | global batch size: 48 | lm loss: 6.448473E+00 | loss scale: 32768.0 | grad norm: 225431.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4632/ 159576 | consumed samples: 104208 | elapsed time per iteration (ms): 15562.5 | learning rate: 2.886E-05 | global batch size: 48 | lm loss: 6.377034E+00 | loss scale: 32768.0 | grad norm: 375353.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4633/ 159576 | consumed samples: 104256 | elapsed time per iteration (ms): 15569.5 | learning rate: 2.887E-05 | global batch size: 48 | lm loss: 6.516908E+00 | loss scale: 32768.0 | grad norm: 333588.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4634/ 159576 | consumed samples: 104304 | elapsed time per iteration (ms): 15928.9 | learning rate: 2.889E-05 | global batch size: 48 | lm loss: 6.574339E+00 | loss scale: 32768.0 | grad norm: 243589.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4635/ 159576 | consumed samples: 104352 | elapsed time per iteration (ms): 15531.5 | learning rate: 2.890E-05 | global batch size: 48 | lm loss: 6.475029E+00 | loss scale: 32768.0 | grad norm: 442923.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4636/ 159576 | consumed samples: 104400 | elapsed time per iteration (ms): 15560.0 | learning rate: 2.891E-05 | global batch size: 48 | lm loss: 6.369026E+00 | loss scale: 32768.0 | grad norm: 295484.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4637/ 159576 | consumed samples: 104448 | elapsed time per iteration (ms): 15543.7 | learning rate: 2.893E-05 | global batch size: 48 | lm loss: 6.490546E+00 | loss scale: 32768.0 | grad norm: 279233.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4638/ 159576 | consumed samples: 104496 | elapsed time per iteration (ms): 15916.4 | learning rate: 2.894E-05 | global batch size: 48 | lm loss: 6.437621E+00 | loss scale: 32768.0 | grad norm: 245214.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4639/ 159576 | consumed samples: 104544 | elapsed time per iteration (ms): 15547.5 | learning rate: 2.895E-05 | global batch size: 48 | lm loss: 6.491655E+00 | loss scale: 32768.0 | grad norm: 240217.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4640/ 159576 | consumed samples: 104592 | elapsed time per iteration (ms): 15573.7 | learning rate: 2.897E-05 | global batch size: 48 | lm loss: 6.455505E+00 | loss scale: 32768.0 | grad norm: 317400.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4641/ 159576 | consumed samples: 104640 | elapsed time per iteration (ms): 15624.7 | learning rate: 2.898E-05 | global batch size: 48 | lm loss: 6.482111E+00 | loss scale: 32768.0 | grad norm: 244102.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4642/ 159576 | consumed samples: 104688 | elapsed time per iteration (ms): 16106.5 | learning rate: 2.899E-05 | global batch size: 48 | lm loss: 6.281504E+00 | loss scale: 32768.0 | grad norm: 282861.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4643/ 159576 | consumed samples: 104736 | elapsed time per iteration (ms): 15639.7 | learning rate: 2.901E-05 | global batch size: 48 | lm loss: 6.420715E+00 | loss scale: 32768.0 | grad norm: 274009.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4644/ 159576 | consumed samples: 104784 | elapsed time per iteration (ms): 15520.7 | learning rate: 2.902E-05 | global batch size: 48 | lm loss: 6.342989E+00 | loss scale: 32768.0 | grad norm: 226933.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4645/ 159576 | consumed samples: 104832 | elapsed time per iteration (ms): 15501.6 | learning rate: 2.903E-05 | global batch size: 48 | lm loss: 6.427937E+00 | loss scale: 32768.0 | grad norm: 278047.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4646/ 159576 | consumed samples: 104880 | elapsed time per iteration (ms): 15629.3 | learning rate: 2.905E-05 | global batch size: 48 | lm loss: 6.294481E+00 | loss scale: 32768.0 | grad norm: 235356.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4647/ 159576 | consumed samples: 104928 | elapsed time per iteration (ms): 15591.9 | learning rate: 2.906E-05 | global batch size: 48 | lm loss: 6.363388E+00 | loss scale: 32768.0 | grad norm: 600293.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4648/ 159576 | consumed samples: 104976 | elapsed time per iteration (ms): 15595.2 | learning rate: 2.907E-05 | global batch size: 48 | lm loss: 6.377505E+00 | loss scale: 32768.0 | grad norm: 331377.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4649/ 159576 | consumed samples: 105024 | elapsed time per iteration (ms): 15628.4 | learning rate: 2.909E-05 | global batch size: 48 | lm loss: 6.381812E+00 | loss scale: 32768.0 | grad norm: 200005.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4650/ 159576 | consumed samples: 105072 | elapsed time per iteration (ms): 15748.7 | learning rate: 2.910E-05 | global batch size: 48 | lm loss: 6.338908E+00 | loss scale: 32768.0 | grad norm: 242913.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4651/ 159576 | consumed samples: 105120 | elapsed time per iteration (ms): 15511.3 | learning rate: 2.911E-05 | global batch size: 48 | lm loss: 6.419736E+00 | loss scale: 32768.0 | grad norm: 330409.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4652/ 159576 | consumed samples: 105168 | elapsed time per iteration (ms): 15516.3 | learning rate: 2.913E-05 | global batch size: 48 | lm loss: 6.404620E+00 | loss scale: 32768.0 | grad norm: 318144.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4653/ 159576 | consumed samples: 105216 | elapsed time per iteration (ms): 15876.3 | learning rate: 2.914E-05 | global batch size: 48 | lm loss: 6.377990E+00 | loss scale: 32768.0 | grad norm: 232202.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4654/ 159576 | consumed samples: 105264 | elapsed time per iteration (ms): 15718.5 | learning rate: 2.915E-05 | global batch size: 48 | lm loss: 6.383665E+00 | loss scale: 32768.0 | grad norm: 241524.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4655/ 159576 | consumed samples: 105312 | elapsed time per iteration (ms): 15610.4 | learning rate: 2.917E-05 | global batch size: 48 | lm loss: 6.403493E+00 | loss scale: 32768.0 | grad norm: 373231.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4656/ 159576 | consumed samples: 105360 | elapsed time per iteration (ms): 15640.8 | learning rate: 2.918E-05 | global batch size: 48 | lm loss: 6.329133E+00 | loss scale: 32768.0 | grad norm: 286954.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4657/ 159576 | consumed samples: 105408 | elapsed time per iteration (ms): 15996.4 | learning rate: 2.919E-05 | global batch size: 48 | lm loss: 6.748344E+00 | loss scale: 32768.0 | grad norm: 260947.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4658/ 159576 | consumed samples: 105456 | elapsed time per iteration (ms): 15522.2 | learning rate: 2.921E-05 | global batch size: 48 | lm loss: 6.315388E+00 | loss scale: 32768.0 | grad norm: 279560.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4659/ 159576 | consumed samples: 105504 | elapsed time per iteration (ms): 15546.8 | learning rate: 2.922E-05 | global batch size: 48 | lm loss: 6.351707E+00 | loss scale: 32768.0 | grad norm: 270238.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4660/ 159576 | consumed samples: 105552 | elapsed time per iteration (ms): 15483.2 | learning rate: 2.923E-05 | global batch size: 48 | lm loss: 6.338678E+00 | loss scale: 32768.0 | grad norm: 299765.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4661/ 159576 | consumed samples: 105600 | elapsed time per iteration (ms): 15828.0 | learning rate: 2.925E-05 | global batch size: 48 | lm loss: 6.427124E+00 | loss scale: 32768.0 | grad norm: 302484.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4662/ 159576 | consumed samples: 105648 | elapsed time per iteration (ms): 15644.1 | learning rate: 2.926E-05 | global batch size: 48 | lm loss: 6.407690E+00 | loss scale: 32768.0 | grad norm: 286169.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4663/ 159576 | consumed samples: 105696 | elapsed time per iteration (ms): 15583.7 | learning rate: 2.927E-05 | global batch size: 48 | lm loss: 6.254132E+00 | loss scale: 32768.0 | grad norm: 276778.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4664/ 159576 | consumed samples: 105744 | elapsed time per iteration (ms): 15651.6 | learning rate: 2.929E-05 | global batch size: 48 | lm loss: 6.469905E+00 | loss scale: 32768.0 | grad norm: 279741.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4665/ 159576 | consumed samples: 105792 | elapsed time per iteration (ms): 15818.3 | learning rate: 2.930E-05 | global batch size: 48 | lm loss: 6.508596E+00 | loss scale: 32768.0 | grad norm: 336670.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4666/ 159576 | consumed samples: 105840 | elapsed time per iteration (ms): 15552.5 | learning rate: 2.931E-05 | global batch size: 48 | lm loss: 6.434944E+00 | loss scale: 32768.0 | grad norm: 242396.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4667/ 159576 | consumed samples: 105888 | elapsed time per iteration (ms): 15512.6 | learning rate: 2.933E-05 | global batch size: 48 | lm loss: 6.510550E+00 | loss scale: 32768.0 | grad norm: 252220.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4668/ 159576 | consumed samples: 105936 | elapsed time per iteration (ms): 15495.7 | learning rate: 2.934E-05 | global batch size: 48 | lm loss: 6.399008E+00 | loss scale: 32768.0 | grad norm: 288495.864 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4669/ 159576 | consumed samples: 105984 | elapsed time per iteration (ms): 15668.5 | learning rate: 2.935E-05 | global batch size: 48 | lm loss: 6.404999E+00 | loss scale: 32768.0 | grad norm: 244327.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4670/ 159576 | consumed samples: 106032 | elapsed time per iteration (ms): 15562.9 | learning rate: 2.937E-05 | global batch size: 48 | lm loss: 6.418772E+00 | loss scale: 32768.0 | grad norm: 313672.915 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4671/ 159576 | consumed samples: 106080 | elapsed time per iteration (ms): 15630.7 | learning rate: 2.938E-05 | global batch size: 48 | lm loss: 6.361070E+00 | loss scale: 32768.0 | grad norm: 276763.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4672/ 159576 | consumed samples: 106128 | elapsed time per iteration (ms): 15597.8 | learning rate: 2.939E-05 | global batch size: 48 | lm loss: 6.477580E+00 | loss scale: 32768.0 | grad norm: 230503.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4673/ 159576 | consumed samples: 106176 | elapsed time per iteration (ms): 15696.4 | learning rate: 2.941E-05 | global batch size: 48 | lm loss: 6.517149E+00 | loss scale: 32768.0 | grad norm: 217937.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4674/ 159576 | consumed samples: 106224 | elapsed time per iteration (ms): 15548.7 | learning rate: 2.942E-05 | global batch size: 48 | lm loss: 6.380251E+00 | loss scale: 32768.0 | grad norm: 267703.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4675/ 159576 | consumed samples: 106272 | elapsed time per iteration (ms): 15515.6 | learning rate: 2.943E-05 | global batch size: 48 | lm loss: 6.348250E+00 | loss scale: 32768.0 | grad norm: 309305.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4676/ 159576 | consumed samples: 106320 | elapsed time per iteration (ms): 15795.7 | learning rate: 2.945E-05 | global batch size: 48 | lm loss: 6.461040E+00 | loss scale: 32768.0 | grad norm: 285074.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4677/ 159576 | consumed samples: 106368 | elapsed time per iteration (ms): 15718.4 | learning rate: 2.946E-05 | global batch size: 48 | lm loss: 6.388801E+00 | loss scale: 32768.0 | grad norm: 292644.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4678/ 159576 | consumed samples: 106416 | elapsed time per iteration (ms): 15585.4 | learning rate: 2.947E-05 | global batch size: 48 | lm loss: 6.417225E+00 | loss scale: 32768.0 | grad norm: 334812.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4679/ 159576 | consumed samples: 106464 | elapsed time per iteration (ms): 15631.1 | learning rate: 2.949E-05 | global batch size: 48 | lm loss: 6.357790E+00 | loss scale: 32768.0 | grad norm: 301017.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4680/ 159576 | consumed samples: 106512 | elapsed time per iteration (ms): 15891.7 | learning rate: 2.950E-05 | global batch size: 48 | lm loss: 6.556364E+00 | loss scale: 32768.0 | grad norm: 280065.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4681/ 159576 | consumed samples: 106560 | elapsed time per iteration (ms): 15562.2 | learning rate: 2.951E-05 | global batch size: 48 | lm loss: 6.393982E+00 | loss scale: 32768.0 | grad norm: 242731.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4682/ 159576 | consumed samples: 106608 | elapsed time per iteration (ms): 15526.5 | learning rate: 2.953E-05 | global batch size: 48 | lm loss: 6.396220E+00 | loss scale: 32768.0 | grad norm: 407344.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4683/ 159576 | consumed samples: 106656 | elapsed time per iteration (ms): 15526.3 | learning rate: 2.954E-05 | global batch size: 48 | lm loss: 6.396249E+00 | loss scale: 32768.0 | grad norm: 300342.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4684/ 159576 | consumed samples: 106704 | elapsed time per iteration (ms): 15885.4 | learning rate: 2.955E-05 | global batch size: 48 | lm loss: 6.375283E+00 | loss scale: 32768.0 | grad norm: 296501.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4685/ 159576 | consumed samples: 106752 | elapsed time per iteration (ms): 15527.4 | learning rate: 2.957E-05 | global batch size: 48 | lm loss: 6.418046E+00 | loss scale: 32768.0 | grad norm: 290100.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4686/ 159576 | consumed samples: 106800 | elapsed time per iteration (ms): 15621.1 | learning rate: 2.958E-05 | global batch size: 48 | lm loss: 6.300463E+00 | loss scale: 32768.0 | grad norm: 265814.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4687/ 159576 | consumed samples: 106848 | elapsed time per iteration (ms): 15592.0 | learning rate: 2.959E-05 | global batch size: 48 | lm loss: 6.440179E+00 | loss scale: 32768.0 | grad norm: 354690.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4688/ 159576 | consumed samples: 106896 | elapsed time per iteration (ms): 15963.5 | learning rate: 2.961E-05 | global batch size: 48 | lm loss: 6.396194E+00 | loss scale: 32768.0 | grad norm: 259594.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4689/ 159576 | consumed samples: 106944 | elapsed time per iteration (ms): 15540.2 | learning rate: 2.962E-05 | global batch size: 48 | lm loss: 6.459390E+00 | loss scale: 32768.0 | grad norm: 326661.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4690/ 159576 | consumed samples: 106992 | elapsed time per iteration (ms): 15512.7 | learning rate: 2.963E-05 | global batch size: 48 | lm loss: 6.324084E+00 | loss scale: 32768.0 | grad norm: 288829.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4691/ 159576 | consumed samples: 107040 | elapsed time per iteration (ms): 8709.6 | learning rate: 2.963E-05 | global batch size: 48 | lm loss: 6.781525E+00 | loss scale: 16384.0 | grad norm: 288829.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4692/ 159576 | consumed samples: 107088 | elapsed time per iteration (ms): 15305.7 | learning rate: 2.964E-05 | global batch size: 48 | lm loss: 6.431325E+00 | loss scale: 16384.0 | grad norm: 145022.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4693/ 159576 | consumed samples: 107136 | elapsed time per iteration (ms): 15550.9 | learning rate: 2.966E-05 | global batch size: 48 | lm loss: 6.516616E+00 | loss scale: 16384.0 | grad norm: 155613.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4694/ 159576 | consumed samples: 107184 | elapsed time per iteration (ms): 15526.9 | learning rate: 2.967E-05 | global batch size: 48 | lm loss: 6.387960E+00 | loss scale: 16384.0 | grad norm: 134461.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4695/ 159576 | consumed samples: 107232 | elapsed time per iteration (ms): 15497.0 | learning rate: 2.968E-05 | global batch size: 48 | lm loss: 6.392653E+00 | loss scale: 16384.0 | grad norm: 141822.076 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4696/ 159576 | consumed samples: 107280 | elapsed time per iteration (ms): 15923.9 | learning rate: 2.970E-05 | global batch size: 48 | lm loss: 6.412030E+00 | loss scale: 16384.0 | grad norm: 175057.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4697/ 159576 | consumed samples: 107328 | elapsed time per iteration (ms): 15425.2 | learning rate: 2.971E-05 | global batch size: 48 | lm loss: 6.373864E+00 | loss scale: 16384.0 | grad norm: 282779.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4698/ 159576 | consumed samples: 107376 | elapsed time per iteration (ms): 15454.6 | learning rate: 2.972E-05 | global batch size: 48 | lm loss: 6.306759E+00 | loss scale: 16384.0 | grad norm: 136700.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4699/ 159576 | consumed samples: 107424 | elapsed time per iteration (ms): 15528.9 | learning rate: 2.974E-05 | global batch size: 48 | lm loss: 6.335629E+00 | loss scale: 16384.0 | grad norm: 184501.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4700/ 159576 | consumed samples: 107472 | elapsed time per iteration (ms): 15956.8 | learning rate: 2.975E-05 | global batch size: 48 | lm loss: 6.408161E+00 | loss scale: 16384.0 | grad norm: 173148.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4701/ 159576 | consumed samples: 107520 | elapsed time per iteration (ms): 15601.2 | learning rate: 2.976E-05 | global batch size: 48 | lm loss: 6.452803E+00 | loss scale: 16384.0 | grad norm: 175212.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4702/ 159576 | consumed samples: 107568 | elapsed time per iteration (ms): 15499.9 | learning rate: 2.978E-05 | global batch size: 48 | lm loss: 6.444376E+00 | loss scale: 16384.0 | grad norm: 154484.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4703/ 159576 | consumed samples: 107616 | elapsed time per iteration (ms): 15505.8 | learning rate: 2.979E-05 | global batch size: 48 | lm loss: 6.378032E+00 | loss scale: 16384.0 | grad norm: 157853.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4704/ 159576 | consumed samples: 107664 | elapsed time per iteration (ms): 15797.2 | learning rate: 2.980E-05 | global batch size: 48 | lm loss: 6.433157E+00 | loss scale: 16384.0 | grad norm: 189038.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4705/ 159576 | consumed samples: 107712 | elapsed time per iteration (ms): 15428.0 | learning rate: 2.982E-05 | global batch size: 48 | lm loss: 6.345381E+00 | loss scale: 16384.0 | grad norm: 223066.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4706/ 159576 | consumed samples: 107760 | elapsed time per iteration (ms): 15506.2 | learning rate: 2.983E-05 | global batch size: 48 | lm loss: 6.409193E+00 | loss scale: 16384.0 | grad norm: 138366.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4707/ 159576 | consumed samples: 107808 | elapsed time per iteration (ms): 15469.9 | learning rate: 2.984E-05 | global batch size: 48 | lm loss: 6.454758E+00 | loss scale: 16384.0 | grad norm: 144072.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4708/ 159576 | consumed samples: 107856 | elapsed time per iteration (ms): 15711.5 | learning rate: 2.986E-05 | global batch size: 48 | lm loss: 6.418115E+00 | loss scale: 16384.0 | grad norm: 160060.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4709/ 159576 | consumed samples: 107904 | elapsed time per iteration (ms): 15549.5 | learning rate: 2.987E-05 | global batch size: 48 | lm loss: 6.323099E+00 | loss scale: 16384.0 | grad norm: 158794.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4710/ 159576 | consumed samples: 107952 | elapsed time per iteration (ms): 15458.0 | learning rate: 2.988E-05 | global batch size: 48 | lm loss: 6.418284E+00 | loss scale: 16384.0 | grad norm: 172985.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4711/ 159576 | consumed samples: 108000 | elapsed time per iteration (ms): 15477.2 | learning rate: 2.990E-05 | global batch size: 48 | lm loss: 6.449984E+00 | loss scale: 16384.0 | grad norm: 151942.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4712/ 159576 | consumed samples: 108048 | elapsed time per iteration (ms): 15912.6 | learning rate: 2.991E-05 | global batch size: 48 | lm loss: 6.331490E+00 | loss scale: 16384.0 | grad norm: 148710.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4713/ 159576 | consumed samples: 108096 | elapsed time per iteration (ms): 15440.5 | learning rate: 2.992E-05 | global batch size: 48 | lm loss: 6.445600E+00 | loss scale: 16384.0 | grad norm: 136119.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4714/ 159576 | consumed samples: 108144 | elapsed time per iteration (ms): 15519.8 | learning rate: 2.994E-05 | global batch size: 48 | lm loss: 6.276518E+00 | loss scale: 16384.0 | grad norm: 170811.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4715/ 159576 | consumed samples: 108192 | elapsed time per iteration (ms): 15866.2 | learning rate: 2.995E-05 | global batch size: 48 | lm loss: 6.430917E+00 | loss scale: 16384.0 | grad norm: 145058.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4716/ 159576 | consumed samples: 108240 | elapsed time per iteration (ms): 15520.8 | learning rate: 2.996E-05 | global batch size: 48 | lm loss: 6.459754E+00 | loss scale: 16384.0 | grad norm: 146862.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4717/ 159576 | consumed samples: 108288 | elapsed time per iteration (ms): 15578.0 | learning rate: 2.998E-05 | global batch size: 48 | lm loss: 6.447017E+00 | loss scale: 16384.0 | grad norm: 172505.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4718/ 159576 | consumed samples: 108336 | elapsed time per iteration (ms): 15434.8 | learning rate: 2.999E-05 | global batch size: 48 | lm loss: 6.316633E+00 | loss scale: 16384.0 | grad norm: 130149.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4719/ 159576 | consumed samples: 108384 | elapsed time per iteration (ms): 15703.7 | learning rate: 3.000E-05 | global batch size: 48 | lm loss: 6.376626E+00 | loss scale: 16384.0 | grad norm: 198273.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4720/ 159576 | consumed samples: 108432 | elapsed time per iteration (ms): 15522.7 | learning rate: 3.002E-05 | global batch size: 48 | lm loss: 6.340569E+00 | loss scale: 16384.0 | grad norm: 189583.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4721/ 159576 | consumed samples: 108480 | elapsed time per iteration (ms): 15419.9 | learning rate: 3.003E-05 | global batch size: 48 | lm loss: 6.519832E+00 | loss scale: 16384.0 | grad norm: 148280.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4722/ 159576 | consumed samples: 108528 | elapsed time per iteration (ms): 15537.6 | learning rate: 3.004E-05 | global batch size: 48 | lm loss: 6.519564E+00 | loss scale: 16384.0 | grad norm: 165136.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4723/ 159576 | consumed samples: 108576 | elapsed time per iteration (ms): 15984.2 | learning rate: 3.006E-05 | global batch size: 48 | lm loss: 6.331813E+00 | loss scale: 16384.0 | grad norm: 137134.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4724/ 159576 | consumed samples: 108624 | elapsed time per iteration (ms): 15591.8 | learning rate: 3.007E-05 | global batch size: 48 | lm loss: 6.417581E+00 | loss scale: 16384.0 | grad norm: 135525.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4725/ 159576 | consumed samples: 108672 | elapsed time per iteration (ms): 15458.7 | learning rate: 3.008E-05 | global batch size: 48 | lm loss: 6.369280E+00 | loss scale: 16384.0 | grad norm: 135730.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4726/ 159576 | consumed samples: 108720 | elapsed time per iteration (ms): 15476.9 | learning rate: 3.010E-05 | global batch size: 48 | lm loss: 6.320598E+00 | loss scale: 16384.0 | grad norm: 147233.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4727/ 159576 | consumed samples: 108768 | elapsed time per iteration (ms): 15812.7 | learning rate: 3.011E-05 | global batch size: 48 | lm loss: 6.469586E+00 | loss scale: 16384.0 | grad norm: 164519.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4728/ 159576 | consumed samples: 108816 | elapsed time per iteration (ms): 15490.9 | learning rate: 3.012E-05 | global batch size: 48 | lm loss: 6.473386E+00 | loss scale: 16384.0 | grad norm: 151619.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4729/ 159576 | consumed samples: 108864 | elapsed time per iteration (ms): 15470.7 | learning rate: 3.014E-05 | global batch size: 48 | lm loss: 6.340328E+00 | loss scale: 16384.0 | grad norm: 137036.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4730/ 159576 | consumed samples: 108912 | elapsed time per iteration (ms): 15531.2 | learning rate: 3.015E-05 | global batch size: 48 | lm loss: 6.394744E+00 | loss scale: 16384.0 | grad norm: 146186.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4731/ 159576 | consumed samples: 108960 | elapsed time per iteration (ms): 15606.4 | learning rate: 3.016E-05 | global batch size: 48 | lm loss: 6.362489E+00 | loss scale: 16384.0 | grad norm: 187444.936 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4732/ 159576 | consumed samples: 109008 | elapsed time per iteration (ms): 15504.3 | learning rate: 3.018E-05 | global batch size: 48 | lm loss: 6.456880E+00 | loss scale: 16384.0 | grad norm: 129595.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4733/ 159576 | consumed samples: 109056 | elapsed time per iteration (ms): 15474.7 | learning rate: 3.019E-05 | global batch size: 48 | lm loss: 6.443705E+00 | loss scale: 16384.0 | grad norm: 137176.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4734/ 159576 | consumed samples: 109104 | elapsed time per iteration (ms): 15468.7 | learning rate: 3.020E-05 | global batch size: 48 | lm loss: 6.325924E+00 | loss scale: 16384.0 | grad norm: 130886.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4735/ 159576 | consumed samples: 109152 | elapsed time per iteration (ms): 15622.9 | learning rate: 3.022E-05 | global batch size: 48 | lm loss: 6.367020E+00 | loss scale: 16384.0 | grad norm: 133365.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4736/ 159576 | consumed samples: 109200 | elapsed time per iteration (ms): 15496.0 | learning rate: 3.023E-05 | global batch size: 48 | lm loss: 6.366150E+00 | loss scale: 16384.0 | grad norm: 170880.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4737/ 159576 | consumed samples: 109248 | elapsed time per iteration (ms): 15489.1 | learning rate: 3.024E-05 | global batch size: 48 | lm loss: 6.352594E+00 | loss scale: 16384.0 | grad norm: 126383.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4738/ 159576 | consumed samples: 109296 | elapsed time per iteration (ms): 15753.5 | learning rate: 3.026E-05 | global batch size: 48 | lm loss: 6.439698E+00 | loss scale: 16384.0 | grad norm: 178764.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4739/ 159576 | consumed samples: 109344 | elapsed time per iteration (ms): 15669.9 | learning rate: 3.027E-05 | global batch size: 48 | lm loss: 6.379218E+00 | loss scale: 16384.0 | grad norm: 140248.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4740/ 159576 | consumed samples: 109392 | elapsed time per iteration (ms): 15472.2 | learning rate: 3.028E-05 | global batch size: 48 | lm loss: 6.455700E+00 | loss scale: 16384.0 | grad norm: 141297.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4741/ 159576 | consumed samples: 109440 | elapsed time per iteration (ms): 15470.3 | learning rate: 3.030E-05 | global batch size: 48 | lm loss: 6.395582E+00 | loss scale: 16384.0 | grad norm: 132933.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4742/ 159576 | consumed samples: 109488 | elapsed time per iteration (ms): 15846.4 | learning rate: 3.031E-05 | global batch size: 48 | lm loss: 6.391361E+00 | loss scale: 16384.0 | grad norm: 118703.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4743/ 159576 | consumed samples: 109536 | elapsed time per iteration (ms): 15513.5 | learning rate: 3.032E-05 | global batch size: 48 | lm loss: 6.428627E+00 | loss scale: 16384.0 | grad norm: 138048.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4744/ 159576 | consumed samples: 109584 | elapsed time per iteration (ms): 15514.2 | learning rate: 3.034E-05 | global batch size: 48 | lm loss: 6.294309E+00 | loss scale: 16384.0 | grad norm: 140003.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4745/ 159576 | consumed samples: 109632 | elapsed time per iteration (ms): 15479.8 | learning rate: 3.035E-05 | global batch size: 48 | lm loss: 6.442544E+00 | loss scale: 16384.0 | grad norm: 137520.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4746/ 159576 | consumed samples: 109680 | elapsed time per iteration (ms): 15909.9 | learning rate: 3.036E-05 | global batch size: 48 | lm loss: 6.330937E+00 | loss scale: 16384.0 | grad norm: 133869.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4747/ 159576 | consumed samples: 109728 | elapsed time per iteration (ms): 15438.5 | learning rate: 3.038E-05 | global batch size: 48 | lm loss: 6.375879E+00 | loss scale: 16384.0 | grad norm: 186074.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4748/ 159576 | consumed samples: 109776 | elapsed time per iteration (ms): 15478.1 | learning rate: 3.039E-05 | global batch size: 48 | lm loss: 6.291435E+00 | loss scale: 16384.0 | grad norm: 133042.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4749/ 159576 | consumed samples: 109824 | elapsed time per iteration (ms): 15511.0 | learning rate: 3.040E-05 | global batch size: 48 | lm loss: 6.392264E+00 | loss scale: 16384.0 | grad norm: 142954.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4750/ 159576 | consumed samples: 109872 | elapsed time per iteration (ms): 15876.7 | learning rate: 3.042E-05 | global batch size: 48 | lm loss: 7.872174E+00 | loss scale: 16384.0 | grad norm: 409825.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4751/ 159576 | consumed samples: 109920 | elapsed time per iteration (ms): 15539.2 | learning rate: 3.043E-05 | global batch size: 48 | lm loss: 6.478594E+00 | loss scale: 16384.0 | grad norm: 125638.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4752/ 159576 | consumed samples: 109968 | elapsed time per iteration (ms): 15507.7 | learning rate: 3.044E-05 | global batch size: 48 | lm loss: 6.357571E+00 | loss scale: 16384.0 | grad norm: 108403.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4753/ 159576 | consumed samples: 110016 | elapsed time per iteration (ms): 15485.4 | learning rate: 3.046E-05 | global batch size: 48 | lm loss: 6.517112E+00 | loss scale: 16384.0 | grad norm: 101971.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4754/ 159576 | consumed samples: 110064 | elapsed time per iteration (ms): 15669.7 | learning rate: 3.047E-05 | global batch size: 48 | lm loss: 6.311660E+00 | loss scale: 16384.0 | grad norm: 117424.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4755/ 159576 | consumed samples: 110112 | elapsed time per iteration (ms): 15529.0 | learning rate: 3.048E-05 | global batch size: 48 | lm loss: 6.452873E+00 | loss scale: 16384.0 | grad norm: 153333.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4756/ 159576 | consumed samples: 110160 | elapsed time per iteration (ms): 15556.8 | learning rate: 3.050E-05 | global batch size: 48 | lm loss: 6.470776E+00 | loss scale: 16384.0 | grad norm: 123606.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4757/ 159576 | consumed samples: 110208 | elapsed time per iteration (ms): 15535.1 | learning rate: 3.051E-05 | global batch size: 48 | lm loss: 6.444992E+00 | loss scale: 16384.0 | grad norm: 103337.864 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4758/ 159576 | consumed samples: 110256 | elapsed time per iteration (ms): 15670.4 | learning rate: 3.052E-05 | global batch size: 48 | lm loss: 6.402925E+00 | loss scale: 16384.0 | grad norm: 145142.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4759/ 159576 | consumed samples: 110304 | elapsed time per iteration (ms): 15615.8 | learning rate: 3.054E-05 | global batch size: 48 | lm loss: 6.383159E+00 | loss scale: 16384.0 | grad norm: 115666.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4760/ 159576 | consumed samples: 110352 | elapsed time per iteration (ms): 15593.7 | learning rate: 3.055E-05 | global batch size: 48 | lm loss: 6.288662E+00 | loss scale: 16384.0 | grad norm: 125590.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4761/ 159576 | consumed samples: 110400 | elapsed time per iteration (ms): 15582.7 | learning rate: 3.056E-05 | global batch size: 48 | lm loss: 6.460382E+00 | loss scale: 16384.0 | grad norm: 131535.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4762/ 159576 | consumed samples: 110448 | elapsed time per iteration (ms): 15777.3 | learning rate: 3.058E-05 | global batch size: 48 | lm loss: 6.421331E+00 | loss scale: 16384.0 | grad norm: 123507.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4763/ 159576 | consumed samples: 110496 | elapsed time per iteration (ms): 15542.1 | learning rate: 3.059E-05 | global batch size: 48 | lm loss: 6.471745E+00 | loss scale: 16384.0 | grad norm: 142533.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4764/ 159576 | consumed samples: 110544 | elapsed time per iteration (ms): 15505.7 | learning rate: 3.060E-05 | global batch size: 48 | lm loss: 6.437591E+00 | loss scale: 16384.0 | grad norm: 150206.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4765/ 159576 | consumed samples: 110592 | elapsed time per iteration (ms): 15784.9 | learning rate: 3.062E-05 | global batch size: 48 | lm loss: 6.426904E+00 | loss scale: 16384.0 | grad norm: 117533.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4766/ 159576 | consumed samples: 110640 | elapsed time per iteration (ms): 15571.9 | learning rate: 3.063E-05 | global batch size: 48 | lm loss: 6.361554E+00 | loss scale: 16384.0 | grad norm: 125319.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4767/ 159576 | consumed samples: 110688 | elapsed time per iteration (ms): 15502.5 | learning rate: 3.064E-05 | global batch size: 48 | lm loss: 6.404096E+00 | loss scale: 16384.0 | grad norm: 137718.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4768/ 159576 | consumed samples: 110736 | elapsed time per iteration (ms): 15543.8 | learning rate: 3.066E-05 | global batch size: 48 | lm loss: 6.437445E+00 | loss scale: 16384.0 | grad norm: 138623.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4769/ 159576 | consumed samples: 110784 | elapsed time per iteration (ms): 15859.0 | learning rate: 3.067E-05 | global batch size: 48 | lm loss: 6.395863E+00 | loss scale: 16384.0 | grad norm: 127878.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4770/ 159576 | consumed samples: 110832 | elapsed time per iteration (ms): 15536.9 | learning rate: 3.068E-05 | global batch size: 48 | lm loss: 6.561028E+00 | loss scale: 16384.0 | grad norm: 124917.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4771/ 159576 | consumed samples: 110880 | elapsed time per iteration (ms): 15506.9 | learning rate: 3.070E-05 | global batch size: 48 | lm loss: 6.471921E+00 | loss scale: 16384.0 | grad norm: 161855.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4772/ 159576 | consumed samples: 110928 | elapsed time per iteration (ms): 15469.5 | learning rate: 3.071E-05 | global batch size: 48 | lm loss: 6.442107E+00 | loss scale: 16384.0 | grad norm: 174619.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4773/ 159576 | consumed samples: 110976 | elapsed time per iteration (ms): 15874.3 | learning rate: 3.072E-05 | global batch size: 48 | lm loss: 6.450697E+00 | loss scale: 16384.0 | grad norm: 128857.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4774/ 159576 | consumed samples: 111024 | elapsed time per iteration (ms): 15476.2 | learning rate: 3.074E-05 | global batch size: 48 | lm loss: 6.409184E+00 | loss scale: 16384.0 | grad norm: 167963.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4775/ 159576 | consumed samples: 111072 | elapsed time per iteration (ms): 15524.6 | learning rate: 3.075E-05 | global batch size: 48 | lm loss: 6.521546E+00 | loss scale: 16384.0 | grad norm: 160789.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4776/ 159576 | consumed samples: 111120 | elapsed time per iteration (ms): 15522.1 | learning rate: 3.076E-05 | global batch size: 48 | lm loss: 6.392659E+00 | loss scale: 16384.0 | grad norm: 144341.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4777/ 159576 | consumed samples: 111168 | elapsed time per iteration (ms): 15807.4 | learning rate: 3.078E-05 | global batch size: 48 | lm loss: 6.295141E+00 | loss scale: 16384.0 | grad norm: 127243.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4778/ 159576 | consumed samples: 111216 | elapsed time per iteration (ms): 15569.3 | learning rate: 3.079E-05 | global batch size: 48 | lm loss: 6.327214E+00 | loss scale: 16384.0 | grad norm: 126284.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4779/ 159576 | consumed samples: 111264 | elapsed time per iteration (ms): 15403.5 | learning rate: 3.080E-05 | global batch size: 48 | lm loss: 6.573749E+00 | loss scale: 16384.0 | grad norm: 122918.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4780/ 159576 | consumed samples: 111312 | elapsed time per iteration (ms): 15381.1 | learning rate: 3.082E-05 | global batch size: 48 | lm loss: 6.433424E+00 | loss scale: 16384.0 | grad norm: 124694.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4781/ 159576 | consumed samples: 111360 | elapsed time per iteration (ms): 15664.5 | learning rate: 3.083E-05 | global batch size: 48 | lm loss: 6.469074E+00 | loss scale: 16384.0 | grad norm: 147526.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4782/ 159576 | consumed samples: 111408 | elapsed time per iteration (ms): 15406.6 | learning rate: 3.084E-05 | global batch size: 48 | lm loss: 6.349575E+00 | loss scale: 16384.0 | grad norm: 124417.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4783/ 159576 | consumed samples: 111456 | elapsed time per iteration (ms): 15497.8 | learning rate: 3.086E-05 | global batch size: 48 | lm loss: 6.254411E+00 | loss scale: 16384.0 | grad norm: 132978.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4784/ 159576 | consumed samples: 111504 | elapsed time per iteration (ms): 15491.3 | learning rate: 3.087E-05 | global batch size: 48 | lm loss: 6.407672E+00 | loss scale: 16384.0 | grad norm: 136226.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4785/ 159576 | consumed samples: 111552 | elapsed time per iteration (ms): 15585.7 | learning rate: 3.088E-05 | global batch size: 48 | lm loss: 6.340271E+00 | loss scale: 16384.0 | grad norm: 143403.987 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4786/ 159576 | consumed samples: 111600 | elapsed time per iteration (ms): 15420.5 | learning rate: 3.090E-05 | global batch size: 48 | lm loss: 6.532565E+00 | loss scale: 16384.0 | grad norm: 146250.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4787/ 159576 | consumed samples: 111648 | elapsed time per iteration (ms): 15350.6 | learning rate: 3.091E-05 | global batch size: 48 | lm loss: 6.387796E+00 | loss scale: 16384.0 | grad norm: 219665.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 21:07:15] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 21:07:15] PULSE: tr8-104B is running for 15:15:04 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 4788/ 159576 | consumed samples: 111696 | elapsed time per iteration (ms): 15408.2 | learning rate: 3.092E-05 | global batch size: 48 | lm loss: 6.385682E+00 | loss scale: 16384.0 | grad norm: 135205.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4789/ 159576 | consumed samples: 111744 | elapsed time per iteration (ms): 15723.0 | learning rate: 3.094E-05 | global batch size: 48 | lm loss: 6.382418E+00 | loss scale: 16384.0 | grad norm: 135775.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4790/ 159576 | consumed samples: 111792 | elapsed time per iteration (ms): 15412.3 | learning rate: 3.095E-05 | global batch size: 48 | lm loss: 6.349115E+00 | loss scale: 16384.0 | grad norm: 161890.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4791/ 159576 | consumed samples: 111840 | elapsed time per iteration (ms): 15444.3 | learning rate: 3.096E-05 | global batch size: 48 | lm loss: 6.551302E+00 | loss scale: 16384.0 | grad norm: 160659.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4792/ 159576 | consumed samples: 111888 | elapsed time per iteration (ms): 15819.0 | learning rate: 3.098E-05 | global batch size: 48 | lm loss: 6.439594E+00 | loss scale: 16384.0 | grad norm: 133779.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4793/ 159576 | consumed samples: 111936 | elapsed time per iteration (ms): 15566.2 | learning rate: 3.099E-05 | global batch size: 48 | lm loss: 6.469571E+00 | loss scale: 16384.0 | grad norm: 134021.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4794/ 159576 | consumed samples: 111984 | elapsed time per iteration (ms): 15417.1 | learning rate: 3.100E-05 | global batch size: 48 | lm loss: 6.302731E+00 | loss scale: 16384.0 | grad norm: 144273.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4795/ 159576 | consumed samples: 112032 | elapsed time per iteration (ms): 15348.6 | learning rate: 3.102E-05 | global batch size: 48 | lm loss: 6.524598E+00 | loss scale: 16384.0 | grad norm: 173531.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4796/ 159576 | consumed samples: 112080 | elapsed time per iteration (ms): 15687.5 | learning rate: 3.103E-05 | global batch size: 48 | lm loss: 6.379292E+00 | loss scale: 16384.0 | grad norm: 135799.927 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4797/ 159576 | consumed samples: 112128 | elapsed time per iteration (ms): 15525.4 | learning rate: 3.104E-05 | global batch size: 48 | lm loss: 6.363866E+00 | loss scale: 16384.0 | grad norm: 157197.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4798/ 159576 | consumed samples: 112176 | elapsed time per iteration (ms): 15407.8 | learning rate: 3.106E-05 | global batch size: 48 | lm loss: 6.301018E+00 | loss scale: 16384.0 | grad norm: 157927.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4799/ 159576 | consumed samples: 112224 | elapsed time per iteration (ms): 15420.4 | learning rate: 3.107E-05 | global batch size: 48 | lm loss: 6.529522E+00 | loss scale: 16384.0 | grad norm: 161359.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4800/ 159576 | consumed samples: 112272 | elapsed time per iteration (ms): 15797.9 | learning rate: 3.108E-05 | global batch size: 48 | lm loss: 6.347914E+00 | loss scale: 16384.0 | grad norm: 147972.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4801/ 159576 | consumed samples: 112320 | elapsed time per iteration (ms): 15327.2 | learning rate: 3.110E-05 | global batch size: 48 | lm loss: 6.375738E+00 | loss scale: 16384.0 | grad norm: 153820.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4802/ 159576 | consumed samples: 112368 | elapsed time per iteration (ms): 15430.2 | learning rate: 3.111E-05 | global batch size: 48 | lm loss: 6.380699E+00 | loss scale: 16384.0 | grad norm: 200141.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4803/ 159576 | consumed samples: 112416 | elapsed time per iteration (ms): 15437.0 | learning rate: 3.112E-05 | global batch size: 48 | lm loss: 6.346474E+00 | loss scale: 16384.0 | grad norm: 150956.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4804/ 159576 | consumed samples: 112464 | elapsed time per iteration (ms): 15932.7 | learning rate: 3.114E-05 | global batch size: 48 | lm loss: 6.424392E+00 | loss scale: 16384.0 | grad norm: 144387.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4805/ 159576 | consumed samples: 112512 | elapsed time per iteration (ms): 15535.0 | learning rate: 3.115E-05 | global batch size: 48 | lm loss: 6.327216E+00 | loss scale: 16384.0 | grad norm: 145981.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4806/ 159576 | consumed samples: 112560 | elapsed time per iteration (ms): 15433.8 | learning rate: 3.116E-05 | global batch size: 48 | lm loss: 6.352614E+00 | loss scale: 16384.0 | grad norm: 159012.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4807/ 159576 | consumed samples: 112608 | elapsed time per iteration (ms): 15389.4 | learning rate: 3.118E-05 | global batch size: 48 | lm loss: 6.523698E+00 | loss scale: 16384.0 | grad norm: 183142.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4808/ 159576 | consumed samples: 112656 | elapsed time per iteration (ms): 15811.1 | learning rate: 3.119E-05 | global batch size: 48 | lm loss: 6.425416E+00 | loss scale: 16384.0 | grad norm: 158356.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4809/ 159576 | consumed samples: 112704 | elapsed time per iteration (ms): 15390.9 | learning rate: 3.120E-05 | global batch size: 48 | lm loss: 6.460537E+00 | loss scale: 16384.0 | grad norm: 160752.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4810/ 159576 | consumed samples: 112752 | elapsed time per iteration (ms): 15403.0 | learning rate: 3.122E-05 | global batch size: 48 | lm loss: 6.358703E+00 | loss scale: 16384.0 | grad norm: 136445.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4811/ 159576 | consumed samples: 112800 | elapsed time per iteration (ms): 15361.3 | learning rate: 3.123E-05 | global batch size: 48 | lm loss: 6.445686E+00 | loss scale: 16384.0 | grad norm: 150287.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4812/ 159576 | consumed samples: 112848 | elapsed time per iteration (ms): 15635.2 | learning rate: 3.124E-05 | global batch size: 48 | lm loss: 6.351339E+00 | loss scale: 16384.0 | grad norm: 127746.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4813/ 159576 | consumed samples: 112896 | elapsed time per iteration (ms): 15458.8 | learning rate: 3.126E-05 | global batch size: 48 | lm loss: 6.509888E+00 | loss scale: 16384.0 | grad norm: 142135.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4814/ 159576 | consumed samples: 112944 | elapsed time per iteration (ms): 15373.2 | learning rate: 3.127E-05 | global batch size: 48 | lm loss: 6.393768E+00 | loss scale: 16384.0 | grad norm: 140003.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4815/ 159576 | consumed samples: 112992 | elapsed time per iteration (ms): 15438.1 | learning rate: 3.128E-05 | global batch size: 48 | lm loss: 6.501161E+00 | loss scale: 16384.0 | grad norm: 148857.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4816/ 159576 | consumed samples: 113040 | elapsed time per iteration (ms): 15632.8 | learning rate: 3.130E-05 | global batch size: 48 | lm loss: 6.330061E+00 | loss scale: 16384.0 | grad norm: 147693.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4817/ 159576 | consumed samples: 113088 | elapsed time per iteration (ms): 15360.6 | learning rate: 3.131E-05 | global batch size: 48 | lm loss: 6.405270E+00 | loss scale: 16384.0 | grad norm: 135039.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4818/ 159576 | consumed samples: 113136 | elapsed time per iteration (ms): 15427.5 | learning rate: 3.132E-05 | global batch size: 48 | lm loss: 6.376327E+00 | loss scale: 16384.0 | grad norm: 144860.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4819/ 159576 | consumed samples: 113184 | elapsed time per iteration (ms): 15402.3 | learning rate: 3.134E-05 | global batch size: 48 | lm loss: 6.422782E+00 | loss scale: 16384.0 | grad norm: 185430.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4820/ 159576 | consumed samples: 113232 | elapsed time per iteration (ms): 15872.7 | learning rate: 3.135E-05 | global batch size: 48 | lm loss: 6.447948E+00 | loss scale: 16384.0 | grad norm: 143563.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4821/ 159576 | consumed samples: 113280 | elapsed time per iteration (ms): 15475.0 | learning rate: 3.136E-05 | global batch size: 48 | lm loss: 6.419926E+00 | loss scale: 16384.0 | grad norm: 139618.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4822/ 159576 | consumed samples: 113328 | elapsed time per iteration (ms): 15479.8 | learning rate: 3.138E-05 | global batch size: 48 | lm loss: 6.307784E+00 | loss scale: 16384.0 | grad norm: 135923.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4823/ 159576 | consumed samples: 113376 | elapsed time per iteration (ms): 15830.9 | learning rate: 3.139E-05 | global batch size: 48 | lm loss: 6.485186E+00 | loss scale: 16384.0 | grad norm: 148878.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4824/ 159576 | consumed samples: 113424 | elapsed time per iteration (ms): 15412.5 | learning rate: 3.140E-05 | global batch size: 48 | lm loss: 6.344635E+00 | loss scale: 16384.0 | grad norm: 144634.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4825/ 159576 | consumed samples: 113472 | elapsed time per iteration (ms): 15399.2 | learning rate: 3.142E-05 | global batch size: 48 | lm loss: 6.380017E+00 | loss scale: 16384.0 | grad norm: 149087.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4826/ 159576 | consumed samples: 113520 | elapsed time per iteration (ms): 15495.5 | learning rate: 3.143E-05 | global batch size: 48 | lm loss: 6.478100E+00 | loss scale: 16384.0 | grad norm: 157916.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4827/ 159576 | consumed samples: 113568 | elapsed time per iteration (ms): 15748.7 | learning rate: 3.144E-05 | global batch size: 48 | lm loss: 6.353170E+00 | loss scale: 16384.0 | grad norm: 130626.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4828/ 159576 | consumed samples: 113616 | elapsed time per iteration (ms): 15356.7 | learning rate: 3.146E-05 | global batch size: 48 | lm loss: 6.307143E+00 | loss scale: 16384.0 | grad norm: 152222.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4829/ 159576 | consumed samples: 113664 | elapsed time per iteration (ms): 15426.2 | learning rate: 3.147E-05 | global batch size: 48 | lm loss: 6.284460E+00 | loss scale: 16384.0 | grad norm: 135151.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4830/ 159576 | consumed samples: 113712 | elapsed time per iteration (ms): 15453.2 | learning rate: 3.148E-05 | global batch size: 48 | lm loss: 6.389065E+00 | loss scale: 16384.0 | grad norm: 158822.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4831/ 159576 | consumed samples: 113760 | elapsed time per iteration (ms): 15757.8 | learning rate: 3.150E-05 | global batch size: 48 | lm loss: 6.330949E+00 | loss scale: 16384.0 | grad norm: 150077.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4832/ 159576 | consumed samples: 113808 | elapsed time per iteration (ms): 8582.4 | learning rate: 3.150E-05 | global batch size: 48 | lm loss: 6.330990E+00 | loss scale: 8192.0 | grad norm: 150077.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4833/ 159576 | consumed samples: 113856 | elapsed time per iteration (ms): 14858.8 | learning rate: 3.151E-05 | global batch size: 48 | lm loss: 6.472740E+00 | loss scale: 8192.0 | grad norm: 80806.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4834/ 159576 | consumed samples: 113904 | elapsed time per iteration (ms): 15406.5 | learning rate: 3.152E-05 | global batch size: 48 | lm loss: 6.386261E+00 | loss scale: 8192.0 | grad norm: 79982.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4835/ 159576 | consumed samples: 113952 | elapsed time per iteration (ms): 15754.6 | learning rate: 3.154E-05 | global batch size: 48 | lm loss: 6.399200E+00 | loss scale: 8192.0 | grad norm: 76427.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4836/ 159576 | consumed samples: 114000 | elapsed time per iteration (ms): 15606.6 | learning rate: 3.155E-05 | global batch size: 48 | lm loss: 6.377688E+00 | loss scale: 8192.0 | grad norm: 72730.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4837/ 159576 | consumed samples: 114048 | elapsed time per iteration (ms): 15427.9 | learning rate: 3.156E-05 | global batch size: 48 | lm loss: 6.362796E+00 | loss scale: 8192.0 | grad norm: 75031.879 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4838/ 159576 | consumed samples: 114096 | elapsed time per iteration (ms): 15459.9 | learning rate: 3.158E-05 | global batch size: 48 | lm loss: 6.427638E+00 | loss scale: 8192.0 | grad norm: 71627.109 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4839/ 159576 | consumed samples: 114144 | elapsed time per iteration (ms): 15785.4 | learning rate: 3.159E-05 | global batch size: 48 | lm loss: 6.319674E+00 | loss scale: 8192.0 | grad norm: 75857.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4840/ 159576 | consumed samples: 114192 | elapsed time per iteration (ms): 15529.1 | learning rate: 3.160E-05 | global batch size: 48 | lm loss: 6.453057E+00 | loss scale: 8192.0 | grad norm: 81110.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4841/ 159576 | consumed samples: 114240 | elapsed time per iteration (ms): 15426.5 | learning rate: 3.162E-05 | global batch size: 48 | lm loss: 6.411851E+00 | loss scale: 8192.0 | grad norm: 86983.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4842/ 159576 | consumed samples: 114288 | elapsed time per iteration (ms): 15460.5 | learning rate: 3.163E-05 | global batch size: 48 | lm loss: 6.377954E+00 | loss scale: 8192.0 | grad norm: 86981.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4843/ 159576 | consumed samples: 114336 | elapsed time per iteration (ms): 15821.2 | learning rate: 3.164E-05 | global batch size: 48 | lm loss: 6.577933E+00 | loss scale: 8192.0 | grad norm: 91346.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4844/ 159576 | consumed samples: 114384 | elapsed time per iteration (ms): 15501.1 | learning rate: 3.166E-05 | global batch size: 48 | lm loss: 6.404775E+00 | loss scale: 8192.0 | grad norm: 73191.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4845/ 159576 | consumed samples: 114432 | elapsed time per iteration (ms): 15559.3 | learning rate: 3.167E-05 | global batch size: 48 | lm loss: 6.405911E+00 | loss scale: 8192.0 | grad norm: 77252.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4846/ 159576 | consumed samples: 114480 | elapsed time per iteration (ms): 15521.7 | learning rate: 3.168E-05 | global batch size: 48 | lm loss: 6.505279E+00 | loss scale: 8192.0 | grad norm: 70335.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4847/ 159576 | consumed samples: 114528 | elapsed time per iteration (ms): 15925.0 | learning rate: 3.170E-05 | global batch size: 48 | lm loss: 6.438465E+00 | loss scale: 8192.0 | grad norm: 73213.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4848/ 159576 | consumed samples: 114576 | elapsed time per iteration (ms): 15612.2 | learning rate: 3.171E-05 | global batch size: 48 | lm loss: 6.452498E+00 | loss scale: 8192.0 | grad norm: 78502.943 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4849/ 159576 | consumed samples: 114624 | elapsed time per iteration (ms): 15443.4 | learning rate: 3.172E-05 | global batch size: 48 | lm loss: 6.394375E+00 | loss scale: 8192.0 | grad norm: 87781.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4850/ 159576 | consumed samples: 114672 | elapsed time per iteration (ms): 15479.4 | learning rate: 3.174E-05 | global batch size: 48 | lm loss: 6.435881E+00 | loss scale: 8192.0 | grad norm: 73932.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4851/ 159576 | consumed samples: 114720 | elapsed time per iteration (ms): 15706.9 | learning rate: 3.175E-05 | global batch size: 48 | lm loss: 6.482435E+00 | loss scale: 8192.0 | grad norm: 80407.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4852/ 159576 | consumed samples: 114768 | elapsed time per iteration (ms): 15526.6 | learning rate: 3.176E-05 | global batch size: 48 | lm loss: 6.479346E+00 | loss scale: 8192.0 | grad norm: 88804.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4853/ 159576 | consumed samples: 114816 | elapsed time per iteration (ms): 15581.7 | learning rate: 3.178E-05 | global batch size: 48 | lm loss: 6.398011E+00 | loss scale: 8192.0 | grad norm: 85238.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4854/ 159576 | consumed samples: 114864 | elapsed time per iteration (ms): 15591.6 | learning rate: 3.179E-05 | global batch size: 48 | lm loss: 6.439957E+00 | loss scale: 8192.0 | grad norm: 79088.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4855/ 159576 | consumed samples: 114912 | elapsed time per iteration (ms): 15588.2 | learning rate: 3.180E-05 | global batch size: 48 | lm loss: 6.525852E+00 | loss scale: 8192.0 | grad norm: 86759.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4856/ 159576 | consumed samples: 114960 | elapsed time per iteration (ms): 15491.8 | learning rate: 3.182E-05 | global batch size: 48 | lm loss: 6.406517E+00 | loss scale: 8192.0 | grad norm: 84644.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4857/ 159576 | consumed samples: 115008 | elapsed time per iteration (ms): 15455.8 | learning rate: 3.183E-05 | global batch size: 48 | lm loss: 6.427845E+00 | loss scale: 8192.0 | grad norm: 95490.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4858/ 159576 | consumed samples: 115056 | elapsed time per iteration (ms): 15508.2 | learning rate: 3.184E-05 | global batch size: 48 | lm loss: 6.500411E+00 | loss scale: 8192.0 | grad norm: 101236.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4859/ 159576 | consumed samples: 115104 | elapsed time per iteration (ms): 15652.7 | learning rate: 3.186E-05 | global batch size: 48 | lm loss: 6.364994E+00 | loss scale: 8192.0 | grad norm: 91582.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4860/ 159576 | consumed samples: 115152 | elapsed time per iteration (ms): 15517.9 | learning rate: 3.187E-05 | global batch size: 48 | lm loss: 6.449871E+00 | loss scale: 8192.0 | grad norm: 66096.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4861/ 159576 | consumed samples: 115200 | elapsed time per iteration (ms): 15569.1 | learning rate: 3.188E-05 | global batch size: 48 | lm loss: 6.364583E+00 | loss scale: 8192.0 | grad norm: 83574.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4862/ 159576 | consumed samples: 115248 | elapsed time per iteration (ms): 15872.9 | learning rate: 3.189E-05 | global batch size: 48 | lm loss: 6.322206E+00 | loss scale: 8192.0 | grad norm: 76576.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4863/ 159576 | consumed samples: 115296 | elapsed time per iteration (ms): 15519.6 | learning rate: 3.191E-05 | global batch size: 48 | lm loss: 6.475718E+00 | loss scale: 8192.0 | grad norm: 68002.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4864/ 159576 | consumed samples: 115344 | elapsed time per iteration (ms): 15516.6 | learning rate: 3.192E-05 | global batch size: 48 | lm loss: 6.312770E+00 | loss scale: 8192.0 | grad norm: 83359.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4865/ 159576 | consumed samples: 115392 | elapsed time per iteration (ms): 15489.9 | learning rate: 3.193E-05 | global batch size: 48 | lm loss: 6.447346E+00 | loss scale: 8192.0 | grad norm: 79898.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4866/ 159576 | consumed samples: 115440 | elapsed time per iteration (ms): 15854.0 | learning rate: 3.195E-05 | global batch size: 48 | lm loss: 6.343767E+00 | loss scale: 8192.0 | grad norm: 82915.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4867/ 159576 | consumed samples: 115488 | elapsed time per iteration (ms): 15538.2 | learning rate: 3.196E-05 | global batch size: 48 | lm loss: 6.421945E+00 | loss scale: 8192.0 | grad norm: 76629.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4868/ 159576 | consumed samples: 115536 | elapsed time per iteration (ms): 15524.2 | learning rate: 3.197E-05 | global batch size: 48 | lm loss: 6.402726E+00 | loss scale: 8192.0 | grad norm: 75429.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4869/ 159576 | consumed samples: 115584 | elapsed time per iteration (ms): 15553.9 | learning rate: 3.199E-05 | global batch size: 48 | lm loss: 6.417988E+00 | loss scale: 8192.0 | grad norm: 82790.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4870/ 159576 | consumed samples: 115632 | elapsed time per iteration (ms): 15916.9 | learning rate: 3.200E-05 | global batch size: 48 | lm loss: 6.289523E+00 | loss scale: 8192.0 | grad norm: 77156.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4871/ 159576 | consumed samples: 115680 | elapsed time per iteration (ms): 15548.8 | learning rate: 3.201E-05 | global batch size: 48 | lm loss: 6.359477E+00 | loss scale: 8192.0 | grad norm: 94063.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4872/ 159576 | consumed samples: 115728 | elapsed time per iteration (ms): 15482.5 | learning rate: 3.203E-05 | global batch size: 48 | lm loss: 6.386482E+00 | loss scale: 8192.0 | grad norm: 70658.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4873/ 159576 | consumed samples: 115776 | elapsed time per iteration (ms): 15555.0 | learning rate: 3.204E-05 | global batch size: 48 | lm loss: 6.524825E+00 | loss scale: 8192.0 | grad norm: 86322.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4874/ 159576 | consumed samples: 115824 | elapsed time per iteration (ms): 15950.6 | learning rate: 3.205E-05 | global batch size: 48 | lm loss: 6.358710E+00 | loss scale: 8192.0 | grad norm: 73619.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4875/ 159576 | consumed samples: 115872 | elapsed time per iteration (ms): 15559.5 | learning rate: 3.207E-05 | global batch size: 48 | lm loss: 6.536497E+00 | loss scale: 8192.0 | grad norm: 89786.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4876/ 159576 | consumed samples: 115920 | elapsed time per iteration (ms): 15463.5 | learning rate: 3.208E-05 | global batch size: 48 | lm loss: 6.427877E+00 | loss scale: 8192.0 | grad norm: 78839.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4877/ 159576 | consumed samples: 115968 | elapsed time per iteration (ms): 15525.4 | learning rate: 3.209E-05 | global batch size: 48 | lm loss: 6.471958E+00 | loss scale: 8192.0 | grad norm: 76472.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4878/ 159576 | consumed samples: 116016 | elapsed time per iteration (ms): 15732.8 | learning rate: 3.211E-05 | global batch size: 48 | lm loss: 6.437389E+00 | loss scale: 8192.0 | grad norm: 86320.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4879/ 159576 | consumed samples: 116064 | elapsed time per iteration (ms): 15464.9 | learning rate: 3.212E-05 | global batch size: 48 | lm loss: 6.365283E+00 | loss scale: 8192.0 | grad norm: 82080.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4880/ 159576 | consumed samples: 116112 | elapsed time per iteration (ms): 15552.2 | learning rate: 3.213E-05 | global batch size: 48 | lm loss: 6.408097E+00 | loss scale: 8192.0 | grad norm: 79728.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4881/ 159576 | consumed samples: 116160 | elapsed time per iteration (ms): 15532.2 | learning rate: 3.215E-05 | global batch size: 48 | lm loss: 6.425485E+00 | loss scale: 8192.0 | grad norm: 102265.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4882/ 159576 | consumed samples: 116208 | elapsed time per iteration (ms): 15707.7 | learning rate: 3.216E-05 | global batch size: 48 | lm loss: 6.276470E+00 | loss scale: 8192.0 | grad norm: 93438.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4883/ 159576 | consumed samples: 116256 | elapsed time per iteration (ms): 15592.8 | learning rate: 3.217E-05 | global batch size: 48 | lm loss: 6.487882E+00 | loss scale: 8192.0 | grad norm: 85760.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4884/ 159576 | consumed samples: 116304 | elapsed time per iteration (ms): 15486.2 | learning rate: 3.219E-05 | global batch size: 48 | lm loss: 6.412776E+00 | loss scale: 8192.0 | grad norm: 84281.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4885/ 159576 | consumed samples: 116352 | elapsed time per iteration (ms): 15807.2 | learning rate: 3.220E-05 | global batch size: 48 | lm loss: 6.340213E+00 | loss scale: 8192.0 | grad norm: 79000.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4886/ 159576 | consumed samples: 116400 | elapsed time per iteration (ms): 15690.6 | learning rate: 3.221E-05 | global batch size: 48 | lm loss: 6.368945E+00 | loss scale: 8192.0 | grad norm: 101421.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4887/ 159576 | consumed samples: 116448 | elapsed time per iteration (ms): 15490.9 | learning rate: 3.223E-05 | global batch size: 48 | lm loss: 6.181931E+00 | loss scale: 8192.0 | grad norm: 80306.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4888/ 159576 | consumed samples: 116496 | elapsed time per iteration (ms): 15541.0 | learning rate: 3.224E-05 | global batch size: 48 | lm loss: 6.508174E+00 | loss scale: 8192.0 | grad norm: 88863.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4889/ 159576 | consumed samples: 116544 | elapsed time per iteration (ms): 15795.9 | learning rate: 3.225E-05 | global batch size: 48 | lm loss: 6.362309E+00 | loss scale: 8192.0 | grad norm: 82730.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4890/ 159576 | consumed samples: 116592 | elapsed time per iteration (ms): 15612.5 | learning rate: 3.227E-05 | global batch size: 48 | lm loss: 6.457442E+00 | loss scale: 8192.0 | grad norm: 77751.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4891/ 159576 | consumed samples: 116640 | elapsed time per iteration (ms): 15523.7 | learning rate: 3.228E-05 | global batch size: 48 | lm loss: 6.382168E+00 | loss scale: 8192.0 | grad norm: 95335.147 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4892/ 159576 | consumed samples: 116688 | elapsed time per iteration (ms): 15565.3 | learning rate: 3.229E-05 | global batch size: 48 | lm loss: 6.443634E+00 | loss scale: 8192.0 | grad norm: 141532.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4893/ 159576 | consumed samples: 116736 | elapsed time per iteration (ms): 15920.8 | learning rate: 3.231E-05 | global batch size: 48 | lm loss: 6.475467E+00 | loss scale: 8192.0 | grad norm: 99006.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4894/ 159576 | consumed samples: 116784 | elapsed time per iteration (ms): 15438.9 | learning rate: 3.232E-05 | global batch size: 48 | lm loss: 6.465964E+00 | loss scale: 8192.0 | grad norm: 104819.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4895/ 159576 | consumed samples: 116832 | elapsed time per iteration (ms): 15486.6 | learning rate: 3.233E-05 | global batch size: 48 | lm loss: 6.355396E+00 | loss scale: 8192.0 | grad norm: 88645.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4896/ 159576 | consumed samples: 116880 | elapsed time per iteration (ms): 15530.2 | learning rate: 3.235E-05 | global batch size: 48 | lm loss: 6.397956E+00 | loss scale: 8192.0 | grad norm: 97080.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4897/ 159576 | consumed samples: 116928 | elapsed time per iteration (ms): 15972.1 | learning rate: 3.236E-05 | global batch size: 48 | lm loss: 6.376213E+00 | loss scale: 8192.0 | grad norm: 91571.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4898/ 159576 | consumed samples: 116976 | elapsed time per iteration (ms): 15582.4 | learning rate: 3.237E-05 | global batch size: 48 | lm loss: 6.338162E+00 | loss scale: 8192.0 | grad norm: 95029.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4899/ 159576 | consumed samples: 117024 | elapsed time per iteration (ms): 15514.7 | learning rate: 3.239E-05 | global batch size: 48 | lm loss: 6.420194E+00 | loss scale: 8192.0 | grad norm: 115966.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4900/ 159576 | consumed samples: 117072 | elapsed time per iteration (ms): 15492.3 | learning rate: 3.240E-05 | global batch size: 48 | lm loss: 6.472268E+00 | loss scale: 8192.0 | grad norm: 117112.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4901/ 159576 | consumed samples: 117120 | elapsed time per iteration (ms): 15707.8 | learning rate: 3.241E-05 | global batch size: 48 | lm loss: 6.365590E+00 | loss scale: 8192.0 | grad norm: 126111.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4902/ 159576 | consumed samples: 117168 | elapsed time per iteration (ms): 15440.6 | learning rate: 3.243E-05 | global batch size: 48 | lm loss: 6.341323E+00 | loss scale: 8192.0 | grad norm: 141040.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4903/ 159576 | consumed samples: 117216 | elapsed time per iteration (ms): 15486.6 | learning rate: 3.244E-05 | global batch size: 48 | lm loss: 6.294356E+00 | loss scale: 8192.0 | grad norm: 92893.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4904/ 159576 | consumed samples: 117264 | elapsed time per iteration (ms): 15374.1 | learning rate: 3.245E-05 | global batch size: 48 | lm loss: 6.459288E+00 | loss scale: 8192.0 | grad norm: 105593.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4905/ 159576 | consumed samples: 117312 | elapsed time per iteration (ms): 15525.3 | learning rate: 3.247E-05 | global batch size: 48 | lm loss: 6.321597E+00 | loss scale: 8192.0 | grad norm: 92345.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4906/ 159576 | consumed samples: 117360 | elapsed time per iteration (ms): 15464.1 | learning rate: 3.248E-05 | global batch size: 48 | lm loss: 6.394690E+00 | loss scale: 8192.0 | grad norm: 115046.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4907/ 159576 | consumed samples: 117408 | elapsed time per iteration (ms): 15463.2 | learning rate: 3.249E-05 | global batch size: 48 | lm loss: 6.382209E+00 | loss scale: 8192.0 | grad norm: 129712.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4908/ 159576 | consumed samples: 117456 | elapsed time per iteration (ms): 15513.8 | learning rate: 3.251E-05 | global batch size: 48 | lm loss: 6.406621E+00 | loss scale: 8192.0 | grad norm: 97342.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4909/ 159576 | consumed samples: 117504 | elapsed time per iteration (ms): 15695.2 | learning rate: 3.252E-05 | global batch size: 48 | lm loss: 6.313143E+00 | loss scale: 8192.0 | grad norm: 113026.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4910/ 159576 | consumed samples: 117552 | elapsed time per iteration (ms): 15443.0 | learning rate: 3.253E-05 | global batch size: 48 | lm loss: 6.450486E+00 | loss scale: 8192.0 | grad norm: 95063.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4911/ 159576 | consumed samples: 117600 | elapsed time per iteration (ms): 15416.6 | learning rate: 3.255E-05 | global batch size: 48 | lm loss: 6.485876E+00 | loss scale: 8192.0 | grad norm: 102064.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4912/ 159576 | consumed samples: 117648 | elapsed time per iteration (ms): 15823.7 | learning rate: 3.256E-05 | global batch size: 48 | lm loss: 6.276315E+00 | loss scale: 8192.0 | grad norm: 114959.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4913/ 159576 | consumed samples: 117696 | elapsed time per iteration (ms): 15625.5 | learning rate: 3.257E-05 | global batch size: 48 | lm loss: 6.405933E+00 | loss scale: 8192.0 | grad norm: 117232.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4914/ 159576 | consumed samples: 117744 | elapsed time per iteration (ms): 15455.3 | learning rate: 3.259E-05 | global batch size: 48 | lm loss: 6.233083E+00 | loss scale: 8192.0 | grad norm: 109853.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4915/ 159576 | consumed samples: 117792 | elapsed time per iteration (ms): 15594.3 | learning rate: 3.260E-05 | global batch size: 48 | lm loss: 6.418136E+00 | loss scale: 8192.0 | grad norm: 108180.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4916/ 159576 | consumed samples: 117840 | elapsed time per iteration (ms): 15954.3 | learning rate: 3.261E-05 | global batch size: 48 | lm loss: 6.385183E+00 | loss scale: 8192.0 | grad norm: 103614.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4917/ 159576 | consumed samples: 117888 | elapsed time per iteration (ms): 15458.8 | learning rate: 3.263E-05 | global batch size: 48 | lm loss: 6.341071E+00 | loss scale: 8192.0 | grad norm: 87833.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4918/ 159576 | consumed samples: 117936 | elapsed time per iteration (ms): 15501.3 | learning rate: 3.264E-05 | global batch size: 48 | lm loss: 6.418250E+00 | loss scale: 8192.0 | grad norm: 91681.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4919/ 159576 | consumed samples: 117984 | elapsed time per iteration (ms): 15446.3 | learning rate: 3.265E-05 | global batch size: 48 | lm loss: 6.298886E+00 | loss scale: 8192.0 | grad norm: 98048.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4920/ 159576 | consumed samples: 118032 | elapsed time per iteration (ms): 15905.0 | learning rate: 3.267E-05 | global batch size: 48 | lm loss: 6.413123E+00 | loss scale: 8192.0 | grad norm: 103541.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4921/ 159576 | consumed samples: 118080 | elapsed time per iteration (ms): 15416.1 | learning rate: 3.268E-05 | global batch size: 48 | lm loss: 6.282074E+00 | loss scale: 8192.0 | grad norm: 100452.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4922/ 159576 | consumed samples: 118128 | elapsed time per iteration (ms): 15499.9 | learning rate: 3.269E-05 | global batch size: 48 | lm loss: 6.371088E+00 | loss scale: 8192.0 | grad norm: 118401.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4923/ 159576 | consumed samples: 118176 | elapsed time per iteration (ms): 15522.6 | learning rate: 3.271E-05 | global batch size: 48 | lm loss: 6.399379E+00 | loss scale: 8192.0 | grad norm: 100877.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4924/ 159576 | consumed samples: 118224 | elapsed time per iteration (ms): 15859.1 | learning rate: 3.272E-05 | global batch size: 48 | lm loss: 6.450886E+00 | loss scale: 8192.0 | grad norm: 115997.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4925/ 159576 | consumed samples: 118272 | elapsed time per iteration (ms): 15622.0 | learning rate: 3.273E-05 | global batch size: 48 | lm loss: 6.412412E+00 | loss scale: 8192.0 | grad norm: 121229.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4926/ 159576 | consumed samples: 118320 | elapsed time per iteration (ms): 15522.5 | learning rate: 3.275E-05 | global batch size: 48 | lm loss: 6.276751E+00 | loss scale: 8192.0 | grad norm: 127323.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4927/ 159576 | consumed samples: 118368 | elapsed time per iteration (ms): 15489.0 | learning rate: 3.276E-05 | global batch size: 48 | lm loss: 6.328137E+00 | loss scale: 8192.0 | grad norm: 109231.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4928/ 159576 | consumed samples: 118416 | elapsed time per iteration (ms): 15679.3 | learning rate: 3.277E-05 | global batch size: 48 | lm loss: 6.343997E+00 | loss scale: 8192.0 | grad norm: 94463.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4929/ 159576 | consumed samples: 118464 | elapsed time per iteration (ms): 15506.4 | learning rate: 3.279E-05 | global batch size: 48 | lm loss: 6.367960E+00 | loss scale: 8192.0 | grad norm: 104644.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4930/ 159576 | consumed samples: 118512 | elapsed time per iteration (ms): 15552.6 | learning rate: 3.280E-05 | global batch size: 48 | lm loss: 6.375040E+00 | loss scale: 8192.0 | grad norm: 108080.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4931/ 159576 | consumed samples: 118560 | elapsed time per iteration (ms): 15566.6 | learning rate: 3.281E-05 | global batch size: 48 | lm loss: 6.468022E+00 | loss scale: 8192.0 | grad norm: 98813.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4932/ 159576 | consumed samples: 118608 | elapsed time per iteration (ms): 15633.8 | learning rate: 3.283E-05 | global batch size: 48 | lm loss: 6.478949E+00 | loss scale: 8192.0 | grad norm: 119522.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4933/ 159576 | consumed samples: 118656 | elapsed time per iteration (ms): 15451.3 | learning rate: 3.284E-05 | global batch size: 48 | lm loss: 6.415487E+00 | loss scale: 8192.0 | grad norm: 121029.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4934/ 159576 | consumed samples: 118704 | elapsed time per iteration (ms): 15537.9 | learning rate: 3.285E-05 | global batch size: 48 | lm loss: 6.436414E+00 | loss scale: 8192.0 | grad norm: 114108.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4935/ 159576 | consumed samples: 118752 | elapsed time per iteration (ms): 15442.4 | learning rate: 3.287E-05 | global batch size: 48 | lm loss: 6.380546E+00 | loss scale: 8192.0 | grad norm: 102153.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4936/ 159576 | consumed samples: 118800 | elapsed time per iteration (ms): 15674.3 | learning rate: 3.288E-05 | global batch size: 48 | lm loss: 6.524636E+00 | loss scale: 8192.0 | grad norm: 89702.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4937/ 159576 | consumed samples: 118848 | elapsed time per iteration (ms): 15501.6 | learning rate: 3.289E-05 | global batch size: 48 | lm loss: 6.352899E+00 | loss scale: 8192.0 | grad norm: 106241.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4938/ 159576 | consumed samples: 118896 | elapsed time per iteration (ms): 15494.9 | learning rate: 3.291E-05 | global batch size: 48 | lm loss: 6.292633E+00 | loss scale: 8192.0 | grad norm: 95129.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4939/ 159576 | consumed samples: 118944 | elapsed time per iteration (ms): 15936.8 | learning rate: 3.292E-05 | global batch size: 48 | lm loss: 6.337314E+00 | loss scale: 8192.0 | grad norm: 120723.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4940/ 159576 | consumed samples: 118992 | elapsed time per iteration (ms): 15531.1 | learning rate: 3.293E-05 | global batch size: 48 | lm loss: 6.391080E+00 | loss scale: 8192.0 | grad norm: 145548.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4941/ 159576 | consumed samples: 119040 | elapsed time per iteration (ms): 15466.0 | learning rate: 3.295E-05 | global batch size: 48 | lm loss: 6.343481E+00 | loss scale: 8192.0 | grad norm: 211104.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4942/ 159576 | consumed samples: 119088 | elapsed time per iteration (ms): 15505.4 | learning rate: 3.296E-05 | global batch size: 48 | lm loss: 6.528688E+00 | loss scale: 8192.0 | grad norm: 140909.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4943/ 159576 | consumed samples: 119136 | elapsed time per iteration (ms): 15830.2 | learning rate: 3.297E-05 | global batch size: 48 | lm loss: 6.411016E+00 | loss scale: 8192.0 | grad norm: 127370.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4944/ 159576 | consumed samples: 119184 | elapsed time per iteration (ms): 15400.2 | learning rate: 3.299E-05 | global batch size: 48 | lm loss: 6.483131E+00 | loss scale: 8192.0 | grad norm: 104651.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4945/ 159576 | consumed samples: 119232 | elapsed time per iteration (ms): 15491.5 | learning rate: 3.300E-05 | global batch size: 48 | lm loss: 6.509373E+00 | loss scale: 8192.0 | grad norm: 129067.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4946/ 159576 | consumed samples: 119280 | elapsed time per iteration (ms): 15557.0 | learning rate: 3.301E-05 | global batch size: 48 | lm loss: 6.338033E+00 | loss scale: 8192.0 | grad norm: 111737.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4947/ 159576 | consumed samples: 119328 | elapsed time per iteration (ms): 15880.4 | learning rate: 3.303E-05 | global batch size: 48 | lm loss: 6.346412E+00 | loss scale: 8192.0 | grad norm: 105173.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4948/ 159576 | consumed samples: 119376 | elapsed time per iteration (ms): 15470.3 | learning rate: 3.304E-05 | global batch size: 48 | lm loss: 6.433241E+00 | loss scale: 8192.0 | grad norm: 117253.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4949/ 159576 | consumed samples: 119424 | elapsed time per iteration (ms): 15464.0 | learning rate: 3.305E-05 | global batch size: 48 | lm loss: 6.408391E+00 | loss scale: 8192.0 | grad norm: 100408.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4950/ 159576 | consumed samples: 119472 | elapsed time per iteration (ms): 15498.5 | learning rate: 3.307E-05 | global batch size: 48 | lm loss: 6.403716E+00 | loss scale: 8192.0 | grad norm: 124240.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4951/ 159576 | consumed samples: 119520 | elapsed time per iteration (ms): 15815.9 | learning rate: 3.308E-05 | global batch size: 48 | lm loss: 6.389519E+00 | loss scale: 8192.0 | grad norm: 100463.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4952/ 159576 | consumed samples: 119568 | elapsed time per iteration (ms): 15557.3 | learning rate: 3.309E-05 | global batch size: 48 | lm loss: 6.505785E+00 | loss scale: 8192.0 | grad norm: 106487.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4953/ 159576 | consumed samples: 119616 | elapsed time per iteration (ms): 15479.5 | learning rate: 3.311E-05 | global batch size: 48 | lm loss: 6.381755E+00 | loss scale: 8192.0 | grad norm: 102228.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4954/ 159576 | consumed samples: 119664 | elapsed time per iteration (ms): 15481.8 | learning rate: 3.312E-05 | global batch size: 48 | lm loss: 6.379836E+00 | loss scale: 8192.0 | grad norm: 118394.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4955/ 159576 | consumed samples: 119712 | elapsed time per iteration (ms): 15784.5 | learning rate: 3.313E-05 | global batch size: 48 | lm loss: 6.475849E+00 | loss scale: 8192.0 | grad norm: 122087.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4956/ 159576 | consumed samples: 119760 | elapsed time per iteration (ms): 15436.2 | learning rate: 3.315E-05 | global batch size: 48 | lm loss: 6.490977E+00 | loss scale: 8192.0 | grad norm: 123577.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4957/ 159576 | consumed samples: 119808 | elapsed time per iteration (ms): 15420.1 | learning rate: 3.316E-05 | global batch size: 48 | lm loss: 6.418243E+00 | loss scale: 8192.0 | grad norm: 146260.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4958/ 159576 | consumed samples: 119856 | elapsed time per iteration (ms): 15433.1 | learning rate: 3.317E-05 | global batch size: 48 | lm loss: 6.375823E+00 | loss scale: 8192.0 | grad norm: 102943.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4959/ 159576 | consumed samples: 119904 | elapsed time per iteration (ms): 15549.7 | learning rate: 3.319E-05 | global batch size: 48 | lm loss: 6.454865E+00 | loss scale: 8192.0 | grad norm: 95733.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4960/ 159576 | consumed samples: 119952 | elapsed time per iteration (ms): 15477.0 | learning rate: 3.320E-05 | global batch size: 48 | lm loss: 6.376845E+00 | loss scale: 8192.0 | grad norm: 105409.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4961/ 159576 | consumed samples: 120000 | elapsed time per iteration (ms): 15553.6 | learning rate: 3.321E-05 | global batch size: 48 | lm loss: 6.369764E+00 | loss scale: 8192.0 | grad norm: 100426.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4962/ 159576 | consumed samples: 120048 | elapsed time per iteration (ms): 15567.9 | learning rate: 3.323E-05 | global batch size: 48 | lm loss: 6.386555E+00 | loss scale: 8192.0 | grad norm: 100112.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4963/ 159576 | consumed samples: 120096 | elapsed time per iteration (ms): 15733.5 | learning rate: 3.324E-05 | global batch size: 48 | lm loss: 6.487816E+00 | loss scale: 8192.0 | grad norm: 117343.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4964/ 159576 | consumed samples: 120144 | elapsed time per iteration (ms): 15368.5 | learning rate: 3.325E-05 | global batch size: 48 | lm loss: 6.415962E+00 | loss scale: 8192.0 | grad norm: 98866.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4965/ 159576 | consumed samples: 120192 | elapsed time per iteration (ms): 15477.1 | learning rate: 3.327E-05 | global batch size: 48 | lm loss: 6.374081E+00 | loss scale: 8192.0 | grad norm: 124767.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4966/ 159576 | consumed samples: 120240 | elapsed time per iteration (ms): 15922.3 | learning rate: 3.328E-05 | global batch size: 48 | lm loss: 6.338925E+00 | loss scale: 8192.0 | grad norm: 229637.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4967/ 159576 | consumed samples: 120288 | elapsed time per iteration (ms): 15438.9 | learning rate: 3.329E-05 | global batch size: 48 | lm loss: 6.318257E+00 | loss scale: 8192.0 | grad norm: 138618.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4968/ 159576 | consumed samples: 120336 | elapsed time per iteration (ms): 15527.5 | learning rate: 3.331E-05 | global batch size: 48 | lm loss: 6.406815E+00 | loss scale: 8192.0 | grad norm: 101628.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4969/ 159576 | consumed samples: 120384 | elapsed time per iteration (ms): 15565.4 | learning rate: 3.332E-05 | global batch size: 48 | lm loss: 6.381866E+00 | loss scale: 8192.0 | grad norm: 138150.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4970/ 159576 | consumed samples: 120432 | elapsed time per iteration (ms): 15898.0 | learning rate: 3.333E-05 | global batch size: 48 | lm loss: 6.305198E+00 | loss scale: 8192.0 | grad norm: 94133.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4971/ 159576 | consumed samples: 120480 | elapsed time per iteration (ms): 15413.4 | learning rate: 3.335E-05 | global batch size: 48 | lm loss: 6.276737E+00 | loss scale: 8192.0 | grad norm: 89212.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4972/ 159576 | consumed samples: 120528 | elapsed time per iteration (ms): 15553.0 | learning rate: 3.336E-05 | global batch size: 48 | lm loss: 6.404760E+00 | loss scale: 8192.0 | grad norm: 119702.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4973/ 159576 | consumed samples: 120576 | elapsed time per iteration (ms): 15428.6 | learning rate: 3.337E-05 | global batch size: 48 | lm loss: 6.225817E+00 | loss scale: 8192.0 | grad norm: 94382.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4974/ 159576 | consumed samples: 120624 | elapsed time per iteration (ms): 15767.2 | learning rate: 3.339E-05 | global batch size: 48 | lm loss: 6.442757E+00 | loss scale: 8192.0 | grad norm: 99692.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4975/ 159576 | consumed samples: 120672 | elapsed time per iteration (ms): 15514.4 | learning rate: 3.340E-05 | global batch size: 48 | lm loss: 6.472607E+00 | loss scale: 8192.0 | grad norm: 112543.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4976/ 159576 | consumed samples: 120720 | elapsed time per iteration (ms): 15602.8 | learning rate: 3.341E-05 | global batch size: 48 | lm loss: 6.382205E+00 | loss scale: 8192.0 | grad norm: 97309.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4977/ 159576 | consumed samples: 120768 | elapsed time per iteration (ms): 15584.4 | learning rate: 3.343E-05 | global batch size: 48 | lm loss: 6.527099E+00 | loss scale: 8192.0 | grad norm: 91482.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4978/ 159576 | consumed samples: 120816 | elapsed time per iteration (ms): 15753.9 | learning rate: 3.344E-05 | global batch size: 48 | lm loss: 6.475079E+00 | loss scale: 8192.0 | grad norm: 167594.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4979/ 159576 | consumed samples: 120864 | elapsed time per iteration (ms): 15592.8 | learning rate: 3.345E-05 | global batch size: 48 | lm loss: 6.377496E+00 | loss scale: 8192.0 | grad norm: 94710.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4980/ 159576 | consumed samples: 120912 | elapsed time per iteration (ms): 15439.6 | learning rate: 3.347E-05 | global batch size: 48 | lm loss: 6.396212E+00 | loss scale: 8192.0 | grad norm: 82226.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4981/ 159576 | consumed samples: 120960 | elapsed time per iteration (ms): 15453.4 | learning rate: 3.348E-05 | global batch size: 48 | lm loss: 6.392390E+00 | loss scale: 8192.0 | grad norm: 93532.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4982/ 159576 | consumed samples: 121008 | elapsed time per iteration (ms): 15623.6 | learning rate: 3.349E-05 | global batch size: 48 | lm loss: 6.384733E+00 | loss scale: 8192.0 | grad norm: 99819.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4983/ 159576 | consumed samples: 121056 | elapsed time per iteration (ms): 15476.4 | learning rate: 3.351E-05 | global batch size: 48 | lm loss: 6.365707E+00 | loss scale: 8192.0 | grad norm: 115195.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4984/ 159576 | consumed samples: 121104 | elapsed time per iteration (ms): 15519.9 | learning rate: 3.352E-05 | global batch size: 48 | lm loss: 6.280232E+00 | loss scale: 8192.0 | grad norm: 88569.976 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4985/ 159576 | consumed samples: 121152 | elapsed time per iteration (ms): 15489.3 | learning rate: 3.353E-05 | global batch size: 48 | lm loss: 6.514761E+00 | loss scale: 8192.0 | grad norm: 110101.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4986/ 159576 | consumed samples: 121200 | elapsed time per iteration (ms): 15582.9 | learning rate: 3.355E-05 | global batch size: 48 | lm loss: 6.394022E+00 | loss scale: 8192.0 | grad norm: 104900.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4987/ 159576 | consumed samples: 121248 | elapsed time per iteration (ms): 15478.8 | learning rate: 3.356E-05 | global batch size: 48 | lm loss: 6.428993E+00 | loss scale: 8192.0 | grad norm: 99980.054 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4988/ 159576 | consumed samples: 121296 | elapsed time per iteration (ms): 15470.8 | learning rate: 3.357E-05 | global batch size: 48 | lm loss: 6.383337E+00 | loss scale: 8192.0 | grad norm: 96150.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4989/ 159576 | consumed samples: 121344 | elapsed time per iteration (ms): 15490.7 | learning rate: 3.359E-05 | global batch size: 48 | lm loss: 6.440140E+00 | loss scale: 8192.0 | grad norm: 99225.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4990/ 159576 | consumed samples: 121392 | elapsed time per iteration (ms): 16022.8 | learning rate: 3.360E-05 | global batch size: 48 | lm loss: 6.329103E+00 | loss scale: 8192.0 | grad norm: 77357.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4991/ 159576 | consumed samples: 121440 | elapsed time per iteration (ms): 15500.7 | learning rate: 3.361E-05 | global batch size: 48 | lm loss: 6.346808E+00 | loss scale: 8192.0 | grad norm: 83379.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4992/ 159576 | consumed samples: 121488 | elapsed time per iteration (ms): 15638.6 | learning rate: 3.363E-05 | global batch size: 48 | lm loss: 6.460890E+00 | loss scale: 8192.0 | grad norm: 114878.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4993/ 159576 | consumed samples: 121536 | elapsed time per iteration (ms): 15882.0 | learning rate: 3.364E-05 | global batch size: 48 | lm loss: 6.485402E+00 | loss scale: 8192.0 | grad norm: 164153.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4994/ 159576 | consumed samples: 121584 | elapsed time per iteration (ms): 15543.1 | learning rate: 3.365E-05 | global batch size: 48 | lm loss: 6.511444E+00 | loss scale: 8192.0 | grad norm: 102365.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4995/ 159576 | consumed samples: 121632 | elapsed time per iteration (ms): 15538.2 | learning rate: 3.367E-05 | global batch size: 48 | lm loss: 6.413379E+00 | loss scale: 8192.0 | grad norm: 115181.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4996/ 159576 | consumed samples: 121680 | elapsed time per iteration (ms): 15508.4 | learning rate: 3.368E-05 | global batch size: 48 | lm loss: 6.359092E+00 | loss scale: 8192.0 | grad norm: 117830.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4997/ 159576 | consumed samples: 121728 | elapsed time per iteration (ms): 15913.3 | learning rate: 3.369E-05 | global batch size: 48 | lm loss: 6.388143E+00 | loss scale: 8192.0 | grad norm: 90924.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4998/ 159576 | consumed samples: 121776 | elapsed time per iteration (ms): 15515.1 | learning rate: 3.371E-05 | global batch size: 48 | lm loss: 6.391698E+00 | loss scale: 8192.0 | grad norm: 103478.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4999/ 159576 | consumed samples: 121824 | elapsed time per iteration (ms): 15550.7 | learning rate: 3.372E-05 | global batch size: 48 | lm loss: 6.279563E+00 | loss scale: 8192.0 | grad norm: 129145.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5000/ 159576 | consumed samples: 121872 | elapsed time per iteration (ms): 15649.4 | learning rate: 3.373E-05 | global batch size: 48 | lm loss: 6.273400E+00 | loss scale: 8192.0 | grad norm: 105673.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 5000 | lm loss value: 6.285937E+00 | lm loss PPL: 5.369671E+02 | ------------------------------------------------------------------------------------------------ iteration 5001/ 159576 | consumed samples: 121920 | elapsed time per iteration (ms): 23355.2 | learning rate: 3.375E-05 | global batch size: 48 | lm loss: 6.432467E+00 | loss scale: 8192.0 | grad norm: 94517.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5002/ 159576 | consumed samples: 121968 | elapsed time per iteration (ms): 15480.8 | learning rate: 3.376E-05 | global batch size: 48 | lm loss: 6.406679E+00 | loss scale: 8192.0 | grad norm: 93989.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5003/ 159576 | consumed samples: 122016 | elapsed time per iteration (ms): 15462.8 | learning rate: 3.377E-05 | global batch size: 48 | lm loss: 6.425644E+00 | loss scale: 8192.0 | grad norm: 89681.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5004/ 159576 | consumed samples: 122064 | elapsed time per iteration (ms): 15981.7 | learning rate: 3.379E-05 | global batch size: 48 | lm loss: 6.492604E+00 | loss scale: 8192.0 | grad norm: 95165.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5005/ 159576 | consumed samples: 122112 | elapsed time per iteration (ms): 15437.2 | learning rate: 3.380E-05 | global batch size: 48 | lm loss: 6.335800E+00 | loss scale: 8192.0 | grad norm: 84441.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5006/ 159576 | consumed samples: 122160 | elapsed time per iteration (ms): 15473.9 | learning rate: 3.381E-05 | global batch size: 48 | lm loss: 6.304031E+00 | loss scale: 8192.0 | grad norm: 87318.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5007/ 159576 | consumed samples: 122208 | elapsed time per iteration (ms): 15548.0 | learning rate: 3.383E-05 | global batch size: 48 | lm loss: 6.363890E+00 | loss scale: 8192.0 | grad norm: 92281.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5008/ 159576 | consumed samples: 122256 | elapsed time per iteration (ms): 15796.4 | learning rate: 3.384E-05 | global batch size: 48 | lm loss: 6.347075E+00 | loss scale: 8192.0 | grad norm: 103172.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5009/ 159576 | consumed samples: 122304 | elapsed time per iteration (ms): 15464.5 | learning rate: 3.385E-05 | global batch size: 48 | lm loss: 6.448061E+00 | loss scale: 8192.0 | grad norm: 95534.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5010/ 159576 | consumed samples: 122352 | elapsed time per iteration (ms): 15447.7 | learning rate: 3.387E-05 | global batch size: 48 | lm loss: 6.328472E+00 | loss scale: 8192.0 | grad norm: 84995.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5011/ 159576 | consumed samples: 122400 | elapsed time per iteration (ms): 15420.5 | learning rate: 3.388E-05 | global batch size: 48 | lm loss: 6.340866E+00 | loss scale: 8192.0 | grad norm: 82422.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5012/ 159576 | consumed samples: 122448 | elapsed time per iteration (ms): 15839.2 | learning rate: 3.389E-05 | global batch size: 48 | lm loss: 6.397783E+00 | loss scale: 8192.0 | grad norm: 162057.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5013/ 159576 | consumed samples: 122496 | elapsed time per iteration (ms): 15565.6 | learning rate: 3.391E-05 | global batch size: 48 | lm loss: 6.363326E+00 | loss scale: 8192.0 | grad norm: 86690.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5014/ 159576 | consumed samples: 122544 | elapsed time per iteration (ms): 15554.7 | learning rate: 3.392E-05 | global batch size: 48 | lm loss: 6.421363E+00 | loss scale: 8192.0 | grad norm: 102318.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5015/ 159576 | consumed samples: 122592 | elapsed time per iteration (ms): 15616.9 | learning rate: 3.393E-05 | global batch size: 48 | lm loss: 6.322345E+00 | loss scale: 8192.0 | grad norm: 83052.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5016/ 159576 | consumed samples: 122640 | elapsed time per iteration (ms): 15870.8 | learning rate: 3.395E-05 | global batch size: 48 | lm loss: 6.384270E+00 | loss scale: 8192.0 | grad norm: 167288.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5017/ 159576 | consumed samples: 122688 | elapsed time per iteration (ms): 15476.4 | learning rate: 3.396E-05 | global batch size: 48 | lm loss: 6.423479E+00 | loss scale: 8192.0 | grad norm: 86029.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5018/ 159576 | consumed samples: 122736 | elapsed time per iteration (ms): 15464.3 | learning rate: 3.397E-05 | global batch size: 48 | lm loss: 6.393809E+00 | loss scale: 8192.0 | grad norm: 123082.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5019/ 159576 | consumed samples: 122784 | elapsed time per iteration (ms): 15459.3 | learning rate: 3.399E-05 | global batch size: 48 | lm loss: 6.420121E+00 | loss scale: 8192.0 | grad norm: 82967.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5020/ 159576 | consumed samples: 122832 | elapsed time per iteration (ms): 15660.8 | learning rate: 3.400E-05 | global batch size: 48 | lm loss: 6.436828E+00 | loss scale: 8192.0 | grad norm: 94157.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 22:07:41] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 22:07:41] PULSE: tr8-104B is running for 16:15:30 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 5021/ 159576 | consumed samples: 122880 | elapsed time per iteration (ms): 15506.9 | learning rate: 3.401E-05 | global batch size: 48 | lm loss: 6.230031E+00 | loss scale: 8192.0 | grad norm: 93236.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5022/ 159576 | consumed samples: 122928 | elapsed time per iteration (ms): 15486.4 | learning rate: 3.403E-05 | global batch size: 48 | lm loss: 6.434629E+00 | loss scale: 8192.0 | grad norm: 88122.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5023/ 159576 | consumed samples: 122976 | elapsed time per iteration (ms): 15558.0 | learning rate: 3.404E-05 | global batch size: 48 | lm loss: 6.447264E+00 | loss scale: 8192.0 | grad norm: 99782.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5024/ 159576 | consumed samples: 123024 | elapsed time per iteration (ms): 15657.7 | learning rate: 3.405E-05 | global batch size: 48 | lm loss: 6.403034E+00 | loss scale: 8192.0 | grad norm: 102592.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5025/ 159576 | consumed samples: 123072 | elapsed time per iteration (ms): 15429.0 | learning rate: 3.407E-05 | global batch size: 48 | lm loss: 6.433703E+00 | loss scale: 8192.0 | grad norm: 82492.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5026/ 159576 | consumed samples: 123120 | elapsed time per iteration (ms): 15492.8 | learning rate: 3.408E-05 | global batch size: 48 | lm loss: 6.505131E+00 | loss scale: 8192.0 | grad norm: 334700.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5027/ 159576 | consumed samples: 123168 | elapsed time per iteration (ms): 15456.4 | learning rate: 3.409E-05 | global batch size: 48 | lm loss: 6.312271E+00 | loss scale: 8192.0 | grad norm: 101204.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5028/ 159576 | consumed samples: 123216 | elapsed time per iteration (ms): 15841.8 | learning rate: 3.411E-05 | global batch size: 48 | lm loss: 6.368502E+00 | loss scale: 8192.0 | grad norm: 103816.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5029/ 159576 | consumed samples: 123264 | elapsed time per iteration (ms): 15474.5 | learning rate: 3.412E-05 | global batch size: 48 | lm loss: 6.350607E+00 | loss scale: 8192.0 | grad norm: 88025.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5030/ 159576 | consumed samples: 123312 | elapsed time per iteration (ms): 15468.9 | learning rate: 3.413E-05 | global batch size: 48 | lm loss: 6.421462E+00 | loss scale: 8192.0 | grad norm: 121501.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5031/ 159576 | consumed samples: 123360 | elapsed time per iteration (ms): 15894.7 | learning rate: 3.414E-05 | global batch size: 48 | lm loss: 6.452309E+00 | loss scale: 8192.0 | grad norm: 98299.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5032/ 159576 | consumed samples: 123408 | elapsed time per iteration (ms): 15372.6 | learning rate: 3.416E-05 | global batch size: 48 | lm loss: 6.470865E+00 | loss scale: 8192.0 | grad norm: 86033.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5033/ 159576 | consumed samples: 123456 | elapsed time per iteration (ms): 15386.4 | learning rate: 3.417E-05 | global batch size: 48 | lm loss: 6.358019E+00 | loss scale: 8192.0 | grad norm: 102254.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5034/ 159576 | consumed samples: 123504 | elapsed time per iteration (ms): 15445.3 | learning rate: 3.418E-05 | global batch size: 48 | lm loss: 6.501051E+00 | loss scale: 8192.0 | grad norm: 106902.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5035/ 159576 | consumed samples: 123552 | elapsed time per iteration (ms): 15687.1 | learning rate: 3.420E-05 | global batch size: 48 | lm loss: 6.441896E+00 | loss scale: 8192.0 | grad norm: 88100.171 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5036/ 159576 | consumed samples: 123600 | elapsed time per iteration (ms): 15548.9 | learning rate: 3.421E-05 | global batch size: 48 | lm loss: 6.297223E+00 | loss scale: 8192.0 | grad norm: 92260.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5037/ 159576 | consumed samples: 123648 | elapsed time per iteration (ms): 15475.3 | learning rate: 3.422E-05 | global batch size: 48 | lm loss: 6.382265E+00 | loss scale: 8192.0 | grad norm: 91449.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5038/ 159576 | consumed samples: 123696 | elapsed time per iteration (ms): 15468.3 | learning rate: 3.424E-05 | global batch size: 48 | lm loss: 6.354884E+00 | loss scale: 8192.0 | grad norm: 112737.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5039/ 159576 | consumed samples: 123744 | elapsed time per iteration (ms): 15758.7 | learning rate: 3.425E-05 | global batch size: 48 | lm loss: 6.504280E+00 | loss scale: 8192.0 | grad norm: 106073.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5040/ 159576 | consumed samples: 123792 | elapsed time per iteration (ms): 15421.0 | learning rate: 3.426E-05 | global batch size: 48 | lm loss: 6.361072E+00 | loss scale: 8192.0 | grad norm: 127074.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5041/ 159576 | consumed samples: 123840 | elapsed time per iteration (ms): 15385.1 | learning rate: 3.428E-05 | global batch size: 48 | lm loss: 6.289526E+00 | loss scale: 8192.0 | grad norm: 92444.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5042/ 159576 | consumed samples: 123888 | elapsed time per iteration (ms): 15433.3 | learning rate: 3.429E-05 | global batch size: 48 | lm loss: 6.276048E+00 | loss scale: 8192.0 | grad norm: 95460.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5043/ 159576 | consumed samples: 123936 | elapsed time per iteration (ms): 15839.0 | learning rate: 3.430E-05 | global batch size: 48 | lm loss: 6.447580E+00 | loss scale: 8192.0 | grad norm: 140216.976 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5044/ 159576 | consumed samples: 123984 | elapsed time per iteration (ms): 15579.5 | learning rate: 3.432E-05 | global batch size: 48 | lm loss: 6.390550E+00 | loss scale: 8192.0 | grad norm: 103110.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5045/ 159576 | consumed samples: 124032 | elapsed time per iteration (ms): 15508.8 | learning rate: 3.433E-05 | global batch size: 48 | lm loss: 6.326768E+00 | loss scale: 8192.0 | grad norm: 143773.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5046/ 159576 | consumed samples: 124080 | elapsed time per iteration (ms): 15498.6 | learning rate: 3.434E-05 | global batch size: 48 | lm loss: 6.474419E+00 | loss scale: 8192.0 | grad norm: 112141.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5047/ 159576 | consumed samples: 124128 | elapsed time per iteration (ms): 15657.7 | learning rate: 3.436E-05 | global batch size: 48 | lm loss: 6.411184E+00 | loss scale: 8192.0 | grad norm: 106306.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5048/ 159576 | consumed samples: 124176 | elapsed time per iteration (ms): 15457.2 | learning rate: 3.437E-05 | global batch size: 48 | lm loss: 6.448883E+00 | loss scale: 8192.0 | grad norm: 119234.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5049/ 159576 | consumed samples: 124224 | elapsed time per iteration (ms): 15413.6 | learning rate: 3.438E-05 | global batch size: 48 | lm loss: 6.307952E+00 | loss scale: 8192.0 | grad norm: 94509.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5050/ 159576 | consumed samples: 124272 | elapsed time per iteration (ms): 15423.5 | learning rate: 3.440E-05 | global batch size: 48 | lm loss: 6.399596E+00 | loss scale: 8192.0 | grad norm: 107196.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5051/ 159576 | consumed samples: 124320 | elapsed time per iteration (ms): 15555.5 | learning rate: 3.441E-05 | global batch size: 48 | lm loss: 6.345298E+00 | loss scale: 8192.0 | grad norm: 101445.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5052/ 159576 | consumed samples: 124368 | elapsed time per iteration (ms): 15471.9 | learning rate: 3.442E-05 | global batch size: 48 | lm loss: 6.399672E+00 | loss scale: 8192.0 | grad norm: 101071.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5053/ 159576 | consumed samples: 124416 | elapsed time per iteration (ms): 15538.7 | learning rate: 3.444E-05 | global batch size: 48 | lm loss: 6.306325E+00 | loss scale: 8192.0 | grad norm: 130980.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5054/ 159576 | consumed samples: 124464 | elapsed time per iteration (ms): 15446.5 | learning rate: 3.445E-05 | global batch size: 48 | lm loss: 6.360683E+00 | loss scale: 8192.0 | grad norm: 138731.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5055/ 159576 | consumed samples: 124512 | elapsed time per iteration (ms): 15548.6 | learning rate: 3.446E-05 | global batch size: 48 | lm loss: 6.415308E+00 | loss scale: 8192.0 | grad norm: 172722.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5056/ 159576 | consumed samples: 124560 | elapsed time per iteration (ms): 15454.2 | learning rate: 3.448E-05 | global batch size: 48 | lm loss: 6.446492E+00 | loss scale: 8192.0 | grad norm: 114779.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5057/ 159576 | consumed samples: 124608 | elapsed time per iteration (ms): 15531.5 | learning rate: 3.449E-05 | global batch size: 48 | lm loss: 6.352797E+00 | loss scale: 8192.0 | grad norm: 93911.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5058/ 159576 | consumed samples: 124656 | elapsed time per iteration (ms): 15916.6 | learning rate: 3.450E-05 | global batch size: 48 | lm loss: 6.394308E+00 | loss scale: 8192.0 | grad norm: 122896.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5059/ 159576 | consumed samples: 124704 | elapsed time per iteration (ms): 15639.0 | learning rate: 3.452E-05 | global batch size: 48 | lm loss: 6.497361E+00 | loss scale: 8192.0 | grad norm: 111301.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5060/ 159576 | consumed samples: 124752 | elapsed time per iteration (ms): 15585.9 | learning rate: 3.453E-05 | global batch size: 48 | lm loss: 6.416485E+00 | loss scale: 8192.0 | grad norm: 111209.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5061/ 159576 | consumed samples: 124800 | elapsed time per iteration (ms): 15476.2 | learning rate: 3.454E-05 | global batch size: 48 | lm loss: 6.385825E+00 | loss scale: 8192.0 | grad norm: 124134.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5062/ 159576 | consumed samples: 124848 | elapsed time per iteration (ms): 15734.0 | learning rate: 3.456E-05 | global batch size: 48 | lm loss: 6.419828E+00 | loss scale: 8192.0 | grad norm: 115134.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5063/ 159576 | consumed samples: 124896 | elapsed time per iteration (ms): 15427.5 | learning rate: 3.457E-05 | global batch size: 48 | lm loss: 6.501984E+00 | loss scale: 8192.0 | grad norm: 94348.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5064/ 159576 | consumed samples: 124944 | elapsed time per iteration (ms): 15367.7 | learning rate: 3.458E-05 | global batch size: 48 | lm loss: 6.435040E+00 | loss scale: 8192.0 | grad norm: 107056.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5065/ 159576 | consumed samples: 124992 | elapsed time per iteration (ms): 15376.7 | learning rate: 3.460E-05 | global batch size: 48 | lm loss: 6.347174E+00 | loss scale: 8192.0 | grad norm: 107513.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5066/ 159576 | consumed samples: 125040 | elapsed time per iteration (ms): 15861.2 | learning rate: 3.461E-05 | global batch size: 48 | lm loss: 6.473555E+00 | loss scale: 8192.0 | grad norm: 96134.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5067/ 159576 | consumed samples: 125088 | elapsed time per iteration (ms): 15376.8 | learning rate: 3.462E-05 | global batch size: 48 | lm loss: 6.364458E+00 | loss scale: 8192.0 | grad norm: 110987.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5068/ 159576 | consumed samples: 125136 | elapsed time per iteration (ms): 15511.1 | learning rate: 3.464E-05 | global batch size: 48 | lm loss: 6.441058E+00 | loss scale: 8192.0 | grad norm: 135931.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5069/ 159576 | consumed samples: 125184 | elapsed time per iteration (ms): 15475.4 | learning rate: 3.465E-05 | global batch size: 48 | lm loss: 6.324648E+00 | loss scale: 8192.0 | grad norm: 108716.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5070/ 159576 | consumed samples: 125232 | elapsed time per iteration (ms): 15862.4 | learning rate: 3.466E-05 | global batch size: 48 | lm loss: 6.318436E+00 | loss scale: 8192.0 | grad norm: 103967.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5071/ 159576 | consumed samples: 125280 | elapsed time per iteration (ms): 15504.6 | learning rate: 3.468E-05 | global batch size: 48 | lm loss: 6.395255E+00 | loss scale: 8192.0 | grad norm: 108399.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5072/ 159576 | consumed samples: 125328 | elapsed time per iteration (ms): 15377.1 | learning rate: 3.469E-05 | global batch size: 48 | lm loss: 6.379922E+00 | loss scale: 8192.0 | grad norm: 103462.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5073/ 159576 | consumed samples: 125376 | elapsed time per iteration (ms): 15411.3 | learning rate: 3.470E-05 | global batch size: 48 | lm loss: 6.396028E+00 | loss scale: 8192.0 | grad norm: 95480.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5074/ 159576 | consumed samples: 125424 | elapsed time per iteration (ms): 15799.1 | learning rate: 3.472E-05 | global batch size: 48 | lm loss: 6.413391E+00 | loss scale: 8192.0 | grad norm: 150193.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5075/ 159576 | consumed samples: 125472 | elapsed time per iteration (ms): 15638.7 | learning rate: 3.473E-05 | global batch size: 48 | lm loss: 6.308775E+00 | loss scale: 8192.0 | grad norm: 129289.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5076/ 159576 | consumed samples: 125520 | elapsed time per iteration (ms): 15490.0 | learning rate: 3.474E-05 | global batch size: 48 | lm loss: 6.273424E+00 | loss scale: 8192.0 | grad norm: 137408.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5077/ 159576 | consumed samples: 125568 | elapsed time per iteration (ms): 15408.8 | learning rate: 3.476E-05 | global batch size: 48 | lm loss: 6.402836E+00 | loss scale: 8192.0 | grad norm: 549435.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5078/ 159576 | consumed samples: 125616 | elapsed time per iteration (ms): 15586.3 | learning rate: 3.477E-05 | global batch size: 48 | lm loss: 6.309762E+00 | loss scale: 8192.0 | grad norm: 104483.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5079/ 159576 | consumed samples: 125664 | elapsed time per iteration (ms): 15542.8 | learning rate: 3.478E-05 | global batch size: 48 | lm loss: 6.315629E+00 | loss scale: 8192.0 | grad norm: 91616.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5080/ 159576 | consumed samples: 125712 | elapsed time per iteration (ms): 15472.1 | learning rate: 3.480E-05 | global batch size: 48 | lm loss: 6.554045E+00 | loss scale: 8192.0 | grad norm: 172370.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5081/ 159576 | consumed samples: 125760 | elapsed time per iteration (ms): 15563.9 | learning rate: 3.481E-05 | global batch size: 48 | lm loss: 6.355201E+00 | loss scale: 8192.0 | grad norm: 125519.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5082/ 159576 | consumed samples: 125808 | elapsed time per iteration (ms): 15777.1 | learning rate: 3.482E-05 | global batch size: 48 | lm loss: 6.435748E+00 | loss scale: 8192.0 | grad norm: 122698.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5083/ 159576 | consumed samples: 125856 | elapsed time per iteration (ms): 15566.4 | learning rate: 3.484E-05 | global batch size: 48 | lm loss: 6.269705E+00 | loss scale: 8192.0 | grad norm: 120100.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5084/ 159576 | consumed samples: 125904 | elapsed time per iteration (ms): 15633.9 | learning rate: 3.485E-05 | global batch size: 48 | lm loss: 6.357334E+00 | loss scale: 8192.0 | grad norm: 98996.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5085/ 159576 | consumed samples: 125952 | elapsed time per iteration (ms): 15985.6 | learning rate: 3.486E-05 | global batch size: 48 | lm loss: 6.393430E+00 | loss scale: 8192.0 | grad norm: 96935.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5086/ 159576 | consumed samples: 126000 | elapsed time per iteration (ms): 15483.1 | learning rate: 3.488E-05 | global batch size: 48 | lm loss: 6.307817E+00 | loss scale: 8192.0 | grad norm: 105392.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5087/ 159576 | consumed samples: 126048 | elapsed time per iteration (ms): 15492.6 | learning rate: 3.489E-05 | global batch size: 48 | lm loss: 6.307018E+00 | loss scale: 8192.0 | grad norm: 119838.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5088/ 159576 | consumed samples: 126096 | elapsed time per iteration (ms): 15510.3 | learning rate: 3.490E-05 | global batch size: 48 | lm loss: 6.400391E+00 | loss scale: 8192.0 | grad norm: 124265.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5089/ 159576 | consumed samples: 126144 | elapsed time per iteration (ms): 15885.9 | learning rate: 3.492E-05 | global batch size: 48 | lm loss: 6.333194E+00 | loss scale: 8192.0 | grad norm: 115702.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5090/ 159576 | consumed samples: 126192 | elapsed time per iteration (ms): 15544.2 | learning rate: 3.493E-05 | global batch size: 48 | lm loss: 6.331620E+00 | loss scale: 8192.0 | grad norm: 137239.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5091/ 159576 | consumed samples: 126240 | elapsed time per iteration (ms): 15557.8 | learning rate: 3.494E-05 | global batch size: 48 | lm loss: 6.437903E+00 | loss scale: 8192.0 | grad norm: 233688.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5092/ 159576 | consumed samples: 126288 | elapsed time per iteration (ms): 15511.8 | learning rate: 3.496E-05 | global batch size: 48 | lm loss: 6.421580E+00 | loss scale: 8192.0 | grad norm: 127898.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5093/ 159576 | consumed samples: 126336 | elapsed time per iteration (ms): 16146.9 | learning rate: 3.497E-05 | global batch size: 48 | lm loss: 6.348750E+00 | loss scale: 8192.0 | grad norm: 200287.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5094/ 159576 | consumed samples: 126384 | elapsed time per iteration (ms): 15650.7 | learning rate: 3.498E-05 | global batch size: 48 | lm loss: 6.384042E+00 | loss scale: 8192.0 | grad norm: 141808.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5095/ 159576 | consumed samples: 126432 | elapsed time per iteration (ms): 15549.8 | learning rate: 3.500E-05 | global batch size: 48 | lm loss: 6.380728E+00 | loss scale: 8192.0 | grad norm: 113750.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5096/ 159576 | consumed samples: 126480 | elapsed time per iteration (ms): 15494.8 | learning rate: 3.501E-05 | global batch size: 48 | lm loss: 6.329007E+00 | loss scale: 8192.0 | grad norm: 142607.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5097/ 159576 | consumed samples: 126528 | elapsed time per iteration (ms): 15805.4 | learning rate: 3.502E-05 | global batch size: 48 | lm loss: 6.331810E+00 | loss scale: 8192.0 | grad norm: 125989.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5098/ 159576 | consumed samples: 126576 | elapsed time per iteration (ms): 15560.8 | learning rate: 3.504E-05 | global batch size: 48 | lm loss: 6.349818E+00 | loss scale: 8192.0 | grad norm: 164955.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5099/ 159576 | consumed samples: 126624 | elapsed time per iteration (ms): 15574.8 | learning rate: 3.505E-05 | global batch size: 48 | lm loss: 6.511029E+00 | loss scale: 8192.0 | grad norm: 150219.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5100/ 159576 | consumed samples: 126672 | elapsed time per iteration (ms): 15588.9 | learning rate: 3.506E-05 | global batch size: 48 | lm loss: 6.365673E+00 | loss scale: 8192.0 | grad norm: 132801.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5101/ 159576 | consumed samples: 126720 | elapsed time per iteration (ms): 15620.0 | learning rate: 3.508E-05 | global batch size: 48 | lm loss: 6.393438E+00 | loss scale: 8192.0 | grad norm: 181251.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5102/ 159576 | consumed samples: 126768 | elapsed time per iteration (ms): 15489.4 | learning rate: 3.509E-05 | global batch size: 48 | lm loss: 6.416411E+00 | loss scale: 8192.0 | grad norm: 117102.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5103/ 159576 | consumed samples: 126816 | elapsed time per iteration (ms): 15557.2 | learning rate: 3.510E-05 | global batch size: 48 | lm loss: 6.328413E+00 | loss scale: 8192.0 | grad norm: 187671.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5104/ 159576 | consumed samples: 126864 | elapsed time per iteration (ms): 15527.6 | learning rate: 3.512E-05 | global batch size: 48 | lm loss: 6.465903E+00 | loss scale: 8192.0 | grad norm: 190613.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5105/ 159576 | consumed samples: 126912 | elapsed time per iteration (ms): 8977.0 | learning rate: 3.512E-05 | global batch size: 48 | lm loss: 6.508333E+00 | loss scale: 4096.0 | grad norm: 190613.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5106/ 159576 | consumed samples: 126960 | elapsed time per iteration (ms): 15010.8 | learning rate: 3.513E-05 | global batch size: 48 | lm loss: 6.436017E+00 | loss scale: 4096.0 | grad norm: 59199.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5107/ 159576 | consumed samples: 127008 | elapsed time per iteration (ms): 15527.1 | learning rate: 3.514E-05 | global batch size: 48 | lm loss: 6.357530E+00 | loss scale: 4096.0 | grad norm: 72710.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5108/ 159576 | consumed samples: 127056 | elapsed time per iteration (ms): 15496.3 | learning rate: 3.516E-05 | global batch size: 48 | lm loss: 6.394055E+00 | loss scale: 4096.0 | grad norm: 94748.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5109/ 159576 | consumed samples: 127104 | elapsed time per iteration (ms): 15957.2 | learning rate: 3.517E-05 | global batch size: 48 | lm loss: 6.443262E+00 | loss scale: 4096.0 | grad norm: 61224.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5110/ 159576 | consumed samples: 127152 | elapsed time per iteration (ms): 15587.8 | learning rate: 3.518E-05 | global batch size: 48 | lm loss: 6.400789E+00 | loss scale: 4096.0 | grad norm: 97179.001 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5111/ 159576 | consumed samples: 127200 | elapsed time per iteration (ms): 15522.6 | learning rate: 3.520E-05 | global batch size: 48 | lm loss: 6.368151E+00 | loss scale: 4096.0 | grad norm: 103211.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5112/ 159576 | consumed samples: 127248 | elapsed time per iteration (ms): 15555.5 | learning rate: 3.521E-05 | global batch size: 48 | lm loss: 6.389073E+00 | loss scale: 4096.0 | grad norm: 68143.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5113/ 159576 | consumed samples: 127296 | elapsed time per iteration (ms): 15672.8 | learning rate: 3.522E-05 | global batch size: 48 | lm loss: 6.453850E+00 | loss scale: 4096.0 | grad norm: 80102.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5114/ 159576 | consumed samples: 127344 | elapsed time per iteration (ms): 15462.8 | learning rate: 3.524E-05 | global batch size: 48 | lm loss: 6.448624E+00 | loss scale: 4096.0 | grad norm: 79184.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5115/ 159576 | consumed samples: 127392 | elapsed time per iteration (ms): 15488.2 | learning rate: 3.525E-05 | global batch size: 48 | lm loss: 6.440034E+00 | loss scale: 4096.0 | grad norm: 65278.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5116/ 159576 | consumed samples: 127440 | elapsed time per iteration (ms): 15517.5 | learning rate: 3.526E-05 | global batch size: 48 | lm loss: 6.452240E+00 | loss scale: 4096.0 | grad norm: 81154.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5117/ 159576 | consumed samples: 127488 | elapsed time per iteration (ms): 15650.3 | learning rate: 3.528E-05 | global batch size: 48 | lm loss: 6.352810E+00 | loss scale: 4096.0 | grad norm: 70667.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5118/ 159576 | consumed samples: 127536 | elapsed time per iteration (ms): 15553.2 | learning rate: 3.529E-05 | global batch size: 48 | lm loss: 6.422338E+00 | loss scale: 4096.0 | grad norm: 76003.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5119/ 159576 | consumed samples: 127584 | elapsed time per iteration (ms): 15525.1 | learning rate: 3.530E-05 | global batch size: 48 | lm loss: 6.345719E+00 | loss scale: 4096.0 | grad norm: 75153.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5120/ 159576 | consumed samples: 127632 | elapsed time per iteration (ms): 15941.5 | learning rate: 3.532E-05 | global batch size: 48 | lm loss: 6.406080E+00 | loss scale: 4096.0 | grad norm: 61393.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5121/ 159576 | consumed samples: 127680 | elapsed time per iteration (ms): 15581.4 | learning rate: 3.533E-05 | global batch size: 48 | lm loss: 6.333064E+00 | loss scale: 4096.0 | grad norm: 84273.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5122/ 159576 | consumed samples: 127728 | elapsed time per iteration (ms): 15534.4 | learning rate: 3.534E-05 | global batch size: 48 | lm loss: 6.430450E+00 | loss scale: 4096.0 | grad norm: 71025.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5123/ 159576 | consumed samples: 127776 | elapsed time per iteration (ms): 15491.5 | learning rate: 3.536E-05 | global batch size: 48 | lm loss: 6.372457E+00 | loss scale: 4096.0 | grad norm: 60958.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5124/ 159576 | consumed samples: 127824 | elapsed time per iteration (ms): 15825.8 | learning rate: 3.537E-05 | global batch size: 48 | lm loss: 6.359689E+00 | loss scale: 4096.0 | grad norm: 69184.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5125/ 159576 | consumed samples: 127872 | elapsed time per iteration (ms): 15572.0 | learning rate: 3.538E-05 | global batch size: 48 | lm loss: 6.354432E+00 | loss scale: 4096.0 | grad norm: 81726.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5126/ 159576 | consumed samples: 127920 | elapsed time per iteration (ms): 15546.1 | learning rate: 3.540E-05 | global batch size: 48 | lm loss: 6.383263E+00 | loss scale: 4096.0 | grad norm: 67932.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5127/ 159576 | consumed samples: 127968 | elapsed time per iteration (ms): 15512.5 | learning rate: 3.541E-05 | global batch size: 48 | lm loss: 6.323973E+00 | loss scale: 4096.0 | grad norm: 69551.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5128/ 159576 | consumed samples: 128016 | elapsed time per iteration (ms): 15872.2 | learning rate: 3.542E-05 | global batch size: 48 | lm loss: 6.384116E+00 | loss scale: 4096.0 | grad norm: 66160.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5129/ 159576 | consumed samples: 128064 | elapsed time per iteration (ms): 15540.5 | learning rate: 3.544E-05 | global batch size: 48 | lm loss: 6.273410E+00 | loss scale: 4096.0 | grad norm: 68712.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5130/ 159576 | consumed samples: 128112 | elapsed time per iteration (ms): 15510.9 | learning rate: 3.545E-05 | global batch size: 48 | lm loss: 6.393827E+00 | loss scale: 4096.0 | grad norm: 80347.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5131/ 159576 | consumed samples: 128160 | elapsed time per iteration (ms): 15546.9 | learning rate: 3.546E-05 | global batch size: 48 | lm loss: 6.494912E+00 | loss scale: 4096.0 | grad norm: 79601.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5132/ 159576 | consumed samples: 128208 | elapsed time per iteration (ms): 15850.8 | learning rate: 3.548E-05 | global batch size: 48 | lm loss: 6.363180E+00 | loss scale: 4096.0 | grad norm: 59957.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5133/ 159576 | consumed samples: 128256 | elapsed time per iteration (ms): 15572.0 | learning rate: 3.549E-05 | global batch size: 48 | lm loss: 6.361386E+00 | loss scale: 4096.0 | grad norm: 65589.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5134/ 159576 | consumed samples: 128304 | elapsed time per iteration (ms): 15554.8 | learning rate: 3.550E-05 | global batch size: 48 | lm loss: 6.338229E+00 | loss scale: 4096.0 | grad norm: 70953.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5135/ 159576 | consumed samples: 128352 | elapsed time per iteration (ms): 15508.1 | learning rate: 3.552E-05 | global batch size: 48 | lm loss: 6.265258E+00 | loss scale: 4096.0 | grad norm: 101476.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5136/ 159576 | consumed samples: 128400 | elapsed time per iteration (ms): 15713.9 | learning rate: 3.553E-05 | global batch size: 48 | lm loss: 6.443205E+00 | loss scale: 4096.0 | grad norm: 70676.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5137/ 159576 | consumed samples: 128448 | elapsed time per iteration (ms): 15500.3 | learning rate: 3.554E-05 | global batch size: 48 | lm loss: 6.297948E+00 | loss scale: 4096.0 | grad norm: 50734.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5138/ 159576 | consumed samples: 128496 | elapsed time per iteration (ms): 15505.3 | learning rate: 3.556E-05 | global batch size: 48 | lm loss: 6.343609E+00 | loss scale: 4096.0 | grad norm: 67207.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5139/ 159576 | consumed samples: 128544 | elapsed time per iteration (ms): 15531.1 | learning rate: 3.557E-05 | global batch size: 48 | lm loss: 6.422406E+00 | loss scale: 4096.0 | grad norm: 50444.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5140/ 159576 | consumed samples: 128592 | elapsed time per iteration (ms): 15679.9 | learning rate: 3.558E-05 | global batch size: 48 | lm loss: 6.377341E+00 | loss scale: 4096.0 | grad norm: 71866.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5141/ 159576 | consumed samples: 128640 | elapsed time per iteration (ms): 15549.3 | learning rate: 3.560E-05 | global batch size: 48 | lm loss: 6.403359E+00 | loss scale: 4096.0 | grad norm: 64942.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5142/ 159576 | consumed samples: 128688 | elapsed time per iteration (ms): 15525.2 | learning rate: 3.561E-05 | global batch size: 48 | lm loss: 6.390831E+00 | loss scale: 4096.0 | grad norm: 66674.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5143/ 159576 | consumed samples: 128736 | elapsed time per iteration (ms): 15540.8 | learning rate: 3.562E-05 | global batch size: 48 | lm loss: 6.391725E+00 | loss scale: 4096.0 | grad norm: 59980.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5144/ 159576 | consumed samples: 128784 | elapsed time per iteration (ms): 15885.0 | learning rate: 3.564E-05 | global batch size: 48 | lm loss: 6.459509E+00 | loss scale: 4096.0 | grad norm: 136366.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5145/ 159576 | consumed samples: 128832 | elapsed time per iteration (ms): 15452.0 | learning rate: 3.565E-05 | global batch size: 48 | lm loss: 6.528796E+00 | loss scale: 4096.0 | grad norm: 82183.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5146/ 159576 | consumed samples: 128880 | elapsed time per iteration (ms): 15509.1 | learning rate: 3.566E-05 | global batch size: 48 | lm loss: 6.420625E+00 | loss scale: 4096.0 | grad norm: 69812.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5147/ 159576 | consumed samples: 128928 | elapsed time per iteration (ms): 15918.9 | learning rate: 3.568E-05 | global batch size: 48 | lm loss: 6.436305E+00 | loss scale: 4096.0 | grad norm: 63955.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5148/ 159576 | consumed samples: 128976 | elapsed time per iteration (ms): 15526.4 | learning rate: 3.569E-05 | global batch size: 48 | lm loss: 6.339918E+00 | loss scale: 4096.0 | grad norm: 56857.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5149/ 159576 | consumed samples: 129024 | elapsed time per iteration (ms): 15529.0 | learning rate: 3.570E-05 | global batch size: 48 | lm loss: 6.345021E+00 | loss scale: 4096.0 | grad norm: 93115.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5150/ 159576 | consumed samples: 129072 | elapsed time per iteration (ms): 15542.6 | learning rate: 3.572E-05 | global batch size: 48 | lm loss: 6.311335E+00 | loss scale: 4096.0 | grad norm: 61629.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5151/ 159576 | consumed samples: 129120 | elapsed time per iteration (ms): 15904.0 | learning rate: 3.573E-05 | global batch size: 48 | lm loss: 6.397278E+00 | loss scale: 4096.0 | grad norm: 65208.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5152/ 159576 | consumed samples: 129168 | elapsed time per iteration (ms): 15450.1 | learning rate: 3.574E-05 | global batch size: 48 | lm loss: 6.345972E+00 | loss scale: 4096.0 | grad norm: 72003.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5153/ 159576 | consumed samples: 129216 | elapsed time per iteration (ms): 15533.3 | learning rate: 3.576E-05 | global batch size: 48 | lm loss: 6.411428E+00 | loss scale: 4096.0 | grad norm: 105237.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5154/ 159576 | consumed samples: 129264 | elapsed time per iteration (ms): 15505.2 | learning rate: 3.577E-05 | global batch size: 48 | lm loss: 6.320354E+00 | loss scale: 4096.0 | grad norm: 101458.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5155/ 159576 | consumed samples: 129312 | elapsed time per iteration (ms): 15994.4 | learning rate: 3.578E-05 | global batch size: 48 | lm loss: 6.453386E+00 | loss scale: 4096.0 | grad norm: 118215.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5156/ 159576 | consumed samples: 129360 | elapsed time per iteration (ms): 15565.8 | learning rate: 3.580E-05 | global batch size: 48 | lm loss: 6.443649E+00 | loss scale: 4096.0 | grad norm: 72691.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5157/ 159576 | consumed samples: 129408 | elapsed time per iteration (ms): 15539.2 | learning rate: 3.581E-05 | global batch size: 48 | lm loss: 6.528984E+00 | loss scale: 4096.0 | grad norm: 72165.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5158/ 159576 | consumed samples: 129456 | elapsed time per iteration (ms): 15536.3 | learning rate: 3.582E-05 | global batch size: 48 | lm loss: 6.398818E+00 | loss scale: 4096.0 | grad norm: 69046.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5159/ 159576 | consumed samples: 129504 | elapsed time per iteration (ms): 15739.5 | learning rate: 3.584E-05 | global batch size: 48 | lm loss: 6.384636E+00 | loss scale: 4096.0 | grad norm: 65721.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5160/ 159576 | consumed samples: 129552 | elapsed time per iteration (ms): 15530.3 | learning rate: 3.585E-05 | global batch size: 48 | lm loss: 6.340583E+00 | loss scale: 4096.0 | grad norm: 70984.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5161/ 159576 | consumed samples: 129600 | elapsed time per iteration (ms): 15537.1 | learning rate: 3.586E-05 | global batch size: 48 | lm loss: 6.299366E+00 | loss scale: 4096.0 | grad norm: 120531.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5162/ 159576 | consumed samples: 129648 | elapsed time per iteration (ms): 15525.1 | learning rate: 3.588E-05 | global batch size: 48 | lm loss: 6.422726E+00 | loss scale: 4096.0 | grad norm: 80943.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5163/ 159576 | consumed samples: 129696 | elapsed time per iteration (ms): 15737.7 | learning rate: 3.589E-05 | global batch size: 48 | lm loss: 6.343781E+00 | loss scale: 4096.0 | grad norm: 62800.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5164/ 159576 | consumed samples: 129744 | elapsed time per iteration (ms): 15570.2 | learning rate: 3.590E-05 | global batch size: 48 | lm loss: 6.478961E+00 | loss scale: 4096.0 | grad norm: 49279.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5165/ 159576 | consumed samples: 129792 | elapsed time per iteration (ms): 15467.9 | learning rate: 3.592E-05 | global batch size: 48 | lm loss: 6.465704E+00 | loss scale: 4096.0 | grad norm: 56608.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5166/ 159576 | consumed samples: 129840 | elapsed time per iteration (ms): 15511.0 | learning rate: 3.593E-05 | global batch size: 48 | lm loss: 6.389446E+00 | loss scale: 4096.0 | grad norm: 64287.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5167/ 159576 | consumed samples: 129888 | elapsed time per iteration (ms): 15650.0 | learning rate: 3.594E-05 | global batch size: 48 | lm loss: 6.432152E+00 | loss scale: 4096.0 | grad norm: 68389.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5168/ 159576 | consumed samples: 129936 | elapsed time per iteration (ms): 15501.5 | learning rate: 3.596E-05 | global batch size: 48 | lm loss: 6.311705E+00 | loss scale: 4096.0 | grad norm: 60127.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5169/ 159576 | consumed samples: 129984 | elapsed time per iteration (ms): 15500.0 | learning rate: 3.597E-05 | global batch size: 48 | lm loss: 6.459386E+00 | loss scale: 4096.0 | grad norm: 193850.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5170/ 159576 | consumed samples: 130032 | elapsed time per iteration (ms): 15853.5 | learning rate: 3.598E-05 | global batch size: 48 | lm loss: 6.359794E+00 | loss scale: 4096.0 | grad norm: 201400.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5171/ 159576 | consumed samples: 130080 | elapsed time per iteration (ms): 15565.6 | learning rate: 3.600E-05 | global batch size: 48 | lm loss: 6.447841E+00 | loss scale: 4096.0 | grad norm: 60758.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5172/ 159576 | consumed samples: 130128 | elapsed time per iteration (ms): 15439.0 | learning rate: 3.601E-05 | global batch size: 48 | lm loss: 6.390144E+00 | loss scale: 4096.0 | grad norm: 60173.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5173/ 159576 | consumed samples: 130176 | elapsed time per iteration (ms): 15512.4 | learning rate: 3.602E-05 | global batch size: 48 | lm loss: 6.471553E+00 | loss scale: 4096.0 | grad norm: 65209.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5174/ 159576 | consumed samples: 130224 | elapsed time per iteration (ms): 15753.1 | learning rate: 3.604E-05 | global batch size: 48 | lm loss: 6.363354E+00 | loss scale: 4096.0 | grad norm: 66471.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5175/ 159576 | consumed samples: 130272 | elapsed time per iteration (ms): 15415.5 | learning rate: 3.605E-05 | global batch size: 48 | lm loss: 6.418964E+00 | loss scale: 4096.0 | grad norm: 63654.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5176/ 159576 | consumed samples: 130320 | elapsed time per iteration (ms): 15469.1 | learning rate: 3.606E-05 | global batch size: 48 | lm loss: 6.357801E+00 | loss scale: 4096.0 | grad norm: 82288.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5177/ 159576 | consumed samples: 130368 | elapsed time per iteration (ms): 15407.1 | learning rate: 3.608E-05 | global batch size: 48 | lm loss: 6.479723E+00 | loss scale: 4096.0 | grad norm: 63508.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5178/ 159576 | consumed samples: 130416 | elapsed time per iteration (ms): 15785.1 | learning rate: 3.609E-05 | global batch size: 48 | lm loss: 6.532706E+00 | loss scale: 4096.0 | grad norm: 62734.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5179/ 159576 | consumed samples: 130464 | elapsed time per iteration (ms): 15467.8 | learning rate: 3.610E-05 | global batch size: 48 | lm loss: 6.442670E+00 | loss scale: 4096.0 | grad norm: 64963.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5180/ 159576 | consumed samples: 130512 | elapsed time per iteration (ms): 15479.5 | learning rate: 3.612E-05 | global batch size: 48 | lm loss: 6.373410E+00 | loss scale: 4096.0 | grad norm: 62492.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5181/ 159576 | consumed samples: 130560 | elapsed time per iteration (ms): 15413.5 | learning rate: 3.613E-05 | global batch size: 48 | lm loss: 6.442731E+00 | loss scale: 4096.0 | grad norm: 93654.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5182/ 159576 | consumed samples: 130608 | elapsed time per iteration (ms): 15788.0 | learning rate: 3.614E-05 | global batch size: 48 | lm loss: 6.356236E+00 | loss scale: 4096.0 | grad norm: 77133.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5183/ 159576 | consumed samples: 130656 | elapsed time per iteration (ms): 15436.5 | learning rate: 3.616E-05 | global batch size: 48 | lm loss: 6.321268E+00 | loss scale: 4096.0 | grad norm: 138010.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5184/ 159576 | consumed samples: 130704 | elapsed time per iteration (ms): 15417.0 | learning rate: 3.617E-05 | global batch size: 48 | lm loss: 6.463357E+00 | loss scale: 4096.0 | grad norm: 67977.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5185/ 159576 | consumed samples: 130752 | elapsed time per iteration (ms): 15399.1 | learning rate: 3.618E-05 | global batch size: 48 | lm loss: 6.369720E+00 | loss scale: 4096.0 | grad norm: 73939.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5186/ 159576 | consumed samples: 130800 | elapsed time per iteration (ms): 15682.4 | learning rate: 3.620E-05 | global batch size: 48 | lm loss: 6.404753E+00 | loss scale: 4096.0 | grad norm: 71441.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5187/ 159576 | consumed samples: 130848 | elapsed time per iteration (ms): 15500.0 | learning rate: 3.621E-05 | global batch size: 48 | lm loss: 6.418368E+00 | loss scale: 4096.0 | grad norm: 85130.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5188/ 159576 | consumed samples: 130896 | elapsed time per iteration (ms): 15437.0 | learning rate: 3.622E-05 | global batch size: 48 | lm loss: 6.391647E+00 | loss scale: 4096.0 | grad norm: 66283.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5189/ 159576 | consumed samples: 130944 | elapsed time per iteration (ms): 15475.7 | learning rate: 3.624E-05 | global batch size: 48 | lm loss: 6.322616E+00 | loss scale: 4096.0 | grad norm: 75047.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5190/ 159576 | consumed samples: 130992 | elapsed time per iteration (ms): 15579.8 | learning rate: 3.625E-05 | global batch size: 48 | lm loss: 6.431418E+00 | loss scale: 4096.0 | grad norm: 58908.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5191/ 159576 | consumed samples: 131040 | elapsed time per iteration (ms): 15429.7 | learning rate: 3.626E-05 | global batch size: 48 | lm loss: 6.535919E+00 | loss scale: 4096.0 | grad norm: 122859.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5192/ 159576 | consumed samples: 131088 | elapsed time per iteration (ms): 15437.2 | learning rate: 3.628E-05 | global batch size: 48 | lm loss: 6.220134E+00 | loss scale: 4096.0 | grad norm: 92437.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5193/ 159576 | consumed samples: 131136 | elapsed time per iteration (ms): 15429.8 | learning rate: 3.629E-05 | global batch size: 48 | lm loss: 6.373948E+00 | loss scale: 4096.0 | grad norm: 93116.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5194/ 159576 | consumed samples: 131184 | elapsed time per iteration (ms): 15588.8 | learning rate: 3.630E-05 | global batch size: 48 | lm loss: 6.390661E+00 | loss scale: 4096.0 | grad norm: 64520.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5195/ 159576 | consumed samples: 131232 | elapsed time per iteration (ms): 15414.6 | learning rate: 3.632E-05 | global batch size: 48 | lm loss: 6.359470E+00 | loss scale: 4096.0 | grad norm: 61039.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5196/ 159576 | consumed samples: 131280 | elapsed time per iteration (ms): 15469.0 | learning rate: 3.633E-05 | global batch size: 48 | lm loss: 6.426967E+00 | loss scale: 4096.0 | grad norm: 69860.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5197/ 159576 | consumed samples: 131328 | elapsed time per iteration (ms): 15399.3 | learning rate: 3.634E-05 | global batch size: 48 | lm loss: 6.397369E+00 | loss scale: 4096.0 | grad norm: 67025.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5198/ 159576 | consumed samples: 131376 | elapsed time per iteration (ms): 15852.9 | learning rate: 3.636E-05 | global batch size: 48 | lm loss: 6.470811E+00 | loss scale: 4096.0 | grad norm: 94172.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5199/ 159576 | consumed samples: 131424 | elapsed time per iteration (ms): 15428.8 | learning rate: 3.637E-05 | global batch size: 48 | lm loss: 6.341267E+00 | loss scale: 4096.0 | grad norm: 73918.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5200/ 159576 | consumed samples: 131472 | elapsed time per iteration (ms): 15444.1 | learning rate: 3.638E-05 | global batch size: 48 | lm loss: 6.434019E+00 | loss scale: 4096.0 | grad norm: 107373.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5201/ 159576 | consumed samples: 131520 | elapsed time per iteration (ms): 15807.8 | learning rate: 3.639E-05 | global batch size: 48 | lm loss: 6.288959E+00 | loss scale: 4096.0 | grad norm: 60538.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5202/ 159576 | consumed samples: 131568 | elapsed time per iteration (ms): 15428.1 | learning rate: 3.641E-05 | global batch size: 48 | lm loss: 6.382991E+00 | loss scale: 4096.0 | grad norm: 87744.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5203/ 159576 | consumed samples: 131616 | elapsed time per iteration (ms): 15473.7 | learning rate: 3.642E-05 | global batch size: 48 | lm loss: 6.421006E+00 | loss scale: 4096.0 | grad norm: 63743.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5204/ 159576 | consumed samples: 131664 | elapsed time per iteration (ms): 15342.5 | learning rate: 3.643E-05 | global batch size: 48 | lm loss: 6.345580E+00 | loss scale: 4096.0 | grad norm: 83317.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5205/ 159576 | consumed samples: 131712 | elapsed time per iteration (ms): 15751.6 | learning rate: 3.645E-05 | global batch size: 48 | lm loss: 6.379266E+00 | loss scale: 4096.0 | grad norm: 72285.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5206/ 159576 | consumed samples: 131760 | elapsed time per iteration (ms): 15391.2 | learning rate: 3.646E-05 | global batch size: 48 | lm loss: 6.296494E+00 | loss scale: 4096.0 | grad norm: 99774.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5207/ 159576 | consumed samples: 131808 | elapsed time per iteration (ms): 15463.8 | learning rate: 3.647E-05 | global batch size: 48 | lm loss: 6.419320E+00 | loss scale: 4096.0 | grad norm: 76787.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5208/ 159576 | consumed samples: 131856 | elapsed time per iteration (ms): 15457.9 | learning rate: 3.649E-05 | global batch size: 48 | lm loss: 6.321754E+00 | loss scale: 4096.0 | grad norm: 71044.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5209/ 159576 | consumed samples: 131904 | elapsed time per iteration (ms): 15812.3 | learning rate: 3.650E-05 | global batch size: 48 | lm loss: 6.295812E+00 | loss scale: 4096.0 | grad norm: 80278.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5210/ 159576 | consumed samples: 131952 | elapsed time per iteration (ms): 15416.3 | learning rate: 3.651E-05 | global batch size: 48 | lm loss: 6.444015E+00 | loss scale: 4096.0 | grad norm: 69086.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5211/ 159576 | consumed samples: 132000 | elapsed time per iteration (ms): 15496.5 | learning rate: 3.653E-05 | global batch size: 48 | lm loss: 6.426943E+00 | loss scale: 4096.0 | grad norm: 87922.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5212/ 159576 | consumed samples: 132048 | elapsed time per iteration (ms): 15327.0 | learning rate: 3.654E-05 | global batch size: 48 | lm loss: 6.361041E+00 | loss scale: 4096.0 | grad norm: 68686.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5213/ 159576 | consumed samples: 132096 | elapsed time per iteration (ms): 15936.5 | learning rate: 3.655E-05 | global batch size: 48 | lm loss: 6.389860E+00 | loss scale: 4096.0 | grad norm: 68529.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5214/ 159576 | consumed samples: 132144 | elapsed time per iteration (ms): 15542.2 | learning rate: 3.657E-05 | global batch size: 48 | lm loss: 6.395509E+00 | loss scale: 4096.0 | grad norm: 66332.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5215/ 159576 | consumed samples: 132192 | elapsed time per iteration (ms): 15481.3 | learning rate: 3.658E-05 | global batch size: 48 | lm loss: 6.378184E+00 | loss scale: 4096.0 | grad norm: 69005.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5216/ 159576 | consumed samples: 132240 | elapsed time per iteration (ms): 15471.0 | learning rate: 3.659E-05 | global batch size: 48 | lm loss: 6.409903E+00 | loss scale: 4096.0 | grad norm: 78238.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5217/ 159576 | consumed samples: 132288 | elapsed time per iteration (ms): 15765.5 | learning rate: 3.661E-05 | global batch size: 48 | lm loss: 6.468248E+00 | loss scale: 4096.0 | grad norm: 81260.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5218/ 159576 | consumed samples: 132336 | elapsed time per iteration (ms): 15514.7 | learning rate: 3.662E-05 | global batch size: 48 | lm loss: 6.462075E+00 | loss scale: 4096.0 | grad norm: 89591.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5219/ 159576 | consumed samples: 132384 | elapsed time per iteration (ms): 15488.0 | learning rate: 3.663E-05 | global batch size: 48 | lm loss: 6.402821E+00 | loss scale: 4096.0 | grad norm: 67243.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5220/ 159576 | consumed samples: 132432 | elapsed time per iteration (ms): 15443.2 | learning rate: 3.665E-05 | global batch size: 48 | lm loss: 6.377299E+00 | loss scale: 4096.0 | grad norm: 73909.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5221/ 159576 | consumed samples: 132480 | elapsed time per iteration (ms): 15695.0 | learning rate: 3.666E-05 | global batch size: 48 | lm loss: 6.451472E+00 | loss scale: 4096.0 | grad norm: 66658.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5222/ 159576 | consumed samples: 132528 | elapsed time per iteration (ms): 15480.5 | learning rate: 3.667E-05 | global batch size: 48 | lm loss: 6.465474E+00 | loss scale: 4096.0 | grad norm: 71303.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5223/ 159576 | consumed samples: 132576 | elapsed time per iteration (ms): 15538.4 | learning rate: 3.669E-05 | global batch size: 48 | lm loss: 6.452018E+00 | loss scale: 4096.0 | grad norm: 61632.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5224/ 159576 | consumed samples: 132624 | elapsed time per iteration (ms): 15433.6 | learning rate: 3.670E-05 | global batch size: 48 | lm loss: 6.417565E+00 | loss scale: 4096.0 | grad norm: 99052.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5225/ 159576 | consumed samples: 132672 | elapsed time per iteration (ms): 16019.0 | learning rate: 3.671E-05 | global batch size: 48 | lm loss: 6.392467E+00 | loss scale: 4096.0 | grad norm: 81901.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5226/ 159576 | consumed samples: 132720 | elapsed time per iteration (ms): 15479.0 | learning rate: 3.673E-05 | global batch size: 48 | lm loss: 6.432102E+00 | loss scale: 4096.0 | grad norm: 80603.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5227/ 159576 | consumed samples: 132768 | elapsed time per iteration (ms): 15499.4 | learning rate: 3.674E-05 | global batch size: 48 | lm loss: 6.304895E+00 | loss scale: 4096.0 | grad norm: 63916.075 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5228/ 159576 | consumed samples: 132816 | elapsed time per iteration (ms): 15774.2 | learning rate: 3.675E-05 | global batch size: 48 | lm loss: 6.323613E+00 | loss scale: 4096.0 | grad norm: 76694.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5229/ 159576 | consumed samples: 132864 | elapsed time per iteration (ms): 15599.1 | learning rate: 3.677E-05 | global batch size: 48 | lm loss: 6.488564E+00 | loss scale: 4096.0 | grad norm: 76280.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5230/ 159576 | consumed samples: 132912 | elapsed time per iteration (ms): 15549.2 | learning rate: 3.678E-05 | global batch size: 48 | lm loss: 6.430355E+00 | loss scale: 4096.0 | grad norm: 71462.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5231/ 159576 | consumed samples: 132960 | elapsed time per iteration (ms): 15501.3 | learning rate: 3.679E-05 | global batch size: 48 | lm loss: 6.493622E+00 | loss scale: 4096.0 | grad norm: 59853.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5232/ 159576 | consumed samples: 133008 | elapsed time per iteration (ms): 15779.3 | learning rate: 3.681E-05 | global batch size: 48 | lm loss: 6.284019E+00 | loss scale: 4096.0 | grad norm: 69496.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5233/ 159576 | consumed samples: 133056 | elapsed time per iteration (ms): 15428.5 | learning rate: 3.682E-05 | global batch size: 48 | lm loss: 6.267179E+00 | loss scale: 4096.0 | grad norm: 63245.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5234/ 159576 | consumed samples: 133104 | elapsed time per iteration (ms): 15461.3 | learning rate: 3.683E-05 | global batch size: 48 | lm loss: 6.449612E+00 | loss scale: 4096.0 | grad norm: 78199.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5235/ 159576 | consumed samples: 133152 | elapsed time per iteration (ms): 15485.3 | learning rate: 3.685E-05 | global batch size: 48 | lm loss: 6.443536E+00 | loss scale: 4096.0 | grad norm: 70168.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5236/ 159576 | consumed samples: 133200 | elapsed time per iteration (ms): 15933.7 | learning rate: 3.686E-05 | global batch size: 48 | lm loss: 6.244983E+00 | loss scale: 4096.0 | grad norm: 75166.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5237/ 159576 | consumed samples: 133248 | elapsed time per iteration (ms): 15418.0 | learning rate: 3.687E-05 | global batch size: 48 | lm loss: 6.283341E+00 | loss scale: 4096.0 | grad norm: 72463.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5238/ 159576 | consumed samples: 133296 | elapsed time per iteration (ms): 15549.2 | learning rate: 3.689E-05 | global batch size: 48 | lm loss: 6.438685E+00 | loss scale: 4096.0 | grad norm: 82352.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5239/ 159576 | consumed samples: 133344 | elapsed time per iteration (ms): 15537.2 | learning rate: 3.690E-05 | global batch size: 48 | lm loss: 6.362652E+00 | loss scale: 4096.0 | grad norm: 70918.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5240/ 159576 | consumed samples: 133392 | elapsed time per iteration (ms): 15840.0 | learning rate: 3.691E-05 | global batch size: 48 | lm loss: 6.368175E+00 | loss scale: 4096.0 | grad norm: 155104.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5241/ 159576 | consumed samples: 133440 | elapsed time per iteration (ms): 15490.2 | learning rate: 3.693E-05 | global batch size: 48 | lm loss: 6.400668E+00 | loss scale: 4096.0 | grad norm: 68076.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5242/ 159576 | consumed samples: 133488 | elapsed time per iteration (ms): 15382.4 | learning rate: 3.694E-05 | global batch size: 48 | lm loss: 6.316941E+00 | loss scale: 4096.0 | grad norm: 57901.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5243/ 159576 | consumed samples: 133536 | elapsed time per iteration (ms): 15382.2 | learning rate: 3.695E-05 | global batch size: 48 | lm loss: 6.494829E+00 | loss scale: 4096.0 | grad norm: 62287.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5244/ 159576 | consumed samples: 133584 | elapsed time per iteration (ms): 15661.6 | learning rate: 3.697E-05 | global batch size: 48 | lm loss: 6.397869E+00 | loss scale: 4096.0 | grad norm: 57367.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5245/ 159576 | consumed samples: 133632 | elapsed time per iteration (ms): 15495.8 | learning rate: 3.698E-05 | global batch size: 48 | lm loss: 6.256347E+00 | loss scale: 4096.0 | grad norm: 61800.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5246/ 159576 | consumed samples: 133680 | elapsed time per iteration (ms): 15523.0 | learning rate: 3.699E-05 | global batch size: 48 | lm loss: 6.389894E+00 | loss scale: 4096.0 | grad norm: 69126.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5247/ 159576 | consumed samples: 133728 | elapsed time per iteration (ms): 15546.9 | learning rate: 3.701E-05 | global batch size: 48 | lm loss: 6.346736E+00 | loss scale: 4096.0 | grad norm: 67046.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5248/ 159576 | consumed samples: 133776 | elapsed time per iteration (ms): 15650.8 | learning rate: 3.702E-05 | global batch size: 48 | lm loss: 6.430111E+00 | loss scale: 4096.0 | grad norm: 69265.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5249/ 159576 | consumed samples: 133824 | elapsed time per iteration (ms): 15490.3 | learning rate: 3.703E-05 | global batch size: 48 | lm loss: 6.390760E+00 | loss scale: 4096.0 | grad norm: 142507.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5250/ 159576 | consumed samples: 133872 | elapsed time per iteration (ms): 15521.8 | learning rate: 3.705E-05 | global batch size: 48 | lm loss: 6.420756E+00 | loss scale: 4096.0 | grad norm: 64815.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5251/ 159576 | consumed samples: 133920 | elapsed time per iteration (ms): 15759.8 | learning rate: 3.706E-05 | global batch size: 48 | lm loss: 6.360211E+00 | loss scale: 4096.0 | grad norm: 82700.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5252/ 159576 | consumed samples: 133968 | elapsed time per iteration (ms): 15623.8 | learning rate: 3.707E-05 | global batch size: 48 | lm loss: 6.351006E+00 | loss scale: 4096.0 | grad norm: 70030.998 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 23:07:37] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 23:07:37] PULSE: tr8-104B is running for 17:15:26 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 5253/ 159576 | consumed samples: 134016 | elapsed time per iteration (ms): 15553.2 | learning rate: 3.709E-05 | global batch size: 48 | lm loss: 6.395989E+00 | loss scale: 4096.0 | grad norm: 75934.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5254/ 159576 | consumed samples: 134064 | elapsed time per iteration (ms): 15521.6 | learning rate: 3.710E-05 | global batch size: 48 | lm loss: 6.388237E+00 | loss scale: 4096.0 | grad norm: 85225.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5255/ 159576 | consumed samples: 134112 | elapsed time per iteration (ms): 15886.3 | learning rate: 3.711E-05 | global batch size: 48 | lm loss: 6.348703E+00 | loss scale: 4096.0 | grad norm: 72802.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5256/ 159576 | consumed samples: 134160 | elapsed time per iteration (ms): 15520.3 | learning rate: 3.713E-05 | global batch size: 48 | lm loss: 6.321572E+00 | loss scale: 4096.0 | grad norm: 73245.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5257/ 159576 | consumed samples: 134208 | elapsed time per iteration (ms): 15443.7 | learning rate: 3.714E-05 | global batch size: 48 | lm loss: 6.335665E+00 | loss scale: 4096.0 | grad norm: 58798.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5258/ 159576 | consumed samples: 134256 | elapsed time per iteration (ms): 15427.0 | learning rate: 3.715E-05 | global batch size: 48 | lm loss: 6.319070E+00 | loss scale: 4096.0 | grad norm: 66591.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5259/ 159576 | consumed samples: 134304 | elapsed time per iteration (ms): 15760.6 | learning rate: 3.717E-05 | global batch size: 48 | lm loss: 6.229961E+00 | loss scale: 4096.0 | grad norm: 78411.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5260/ 159576 | consumed samples: 134352 | elapsed time per iteration (ms): 15544.0 | learning rate: 3.718E-05 | global batch size: 48 | lm loss: 6.379896E+00 | loss scale: 4096.0 | grad norm: 82294.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5261/ 159576 | consumed samples: 134400 | elapsed time per iteration (ms): 15397.8 | learning rate: 3.719E-05 | global batch size: 48 | lm loss: 6.233184E+00 | loss scale: 4096.0 | grad norm: 65525.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5262/ 159576 | consumed samples: 134448 | elapsed time per iteration (ms): 15498.3 | learning rate: 3.721E-05 | global batch size: 48 | lm loss: 6.326461E+00 | loss scale: 4096.0 | grad norm: 101232.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5263/ 159576 | consumed samples: 134496 | elapsed time per iteration (ms): 15834.8 | learning rate: 3.722E-05 | global batch size: 48 | lm loss: 6.351873E+00 | loss scale: 4096.0 | grad norm: 82652.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5264/ 159576 | consumed samples: 134544 | elapsed time per iteration (ms): 15450.4 | learning rate: 3.723E-05 | global batch size: 48 | lm loss: 6.411518E+00 | loss scale: 4096.0 | grad norm: 79704.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5265/ 159576 | consumed samples: 134592 | elapsed time per iteration (ms): 15408.5 | learning rate: 3.725E-05 | global batch size: 48 | lm loss: 6.324855E+00 | loss scale: 4096.0 | grad norm: 96783.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5266/ 159576 | consumed samples: 134640 | elapsed time per iteration (ms): 15369.4 | learning rate: 3.726E-05 | global batch size: 48 | lm loss: 6.351592E+00 | loss scale: 4096.0 | grad norm: 96231.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5267/ 159576 | consumed samples: 134688 | elapsed time per iteration (ms): 15643.8 | learning rate: 3.727E-05 | global batch size: 48 | lm loss: 6.439371E+00 | loss scale: 4096.0 | grad norm: 86165.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5268/ 159576 | consumed samples: 134736 | elapsed time per iteration (ms): 15428.0 | learning rate: 3.729E-05 | global batch size: 48 | lm loss: 6.282881E+00 | loss scale: 4096.0 | grad norm: 95370.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5269/ 159576 | consumed samples: 134784 | elapsed time per iteration (ms): 15422.7 | learning rate: 3.730E-05 | global batch size: 48 | lm loss: 6.489480E+00 | loss scale: 4096.0 | grad norm: 77407.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5270/ 159576 | consumed samples: 134832 | elapsed time per iteration (ms): 15384.0 | learning rate: 3.731E-05 | global batch size: 48 | lm loss: 6.382200E+00 | loss scale: 4096.0 | grad norm: 66716.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5271/ 159576 | consumed samples: 134880 | elapsed time per iteration (ms): 15581.8 | learning rate: 3.733E-05 | global batch size: 48 | lm loss: 6.409722E+00 | loss scale: 4096.0 | grad norm: 68218.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5272/ 159576 | consumed samples: 134928 | elapsed time per iteration (ms): 15395.7 | learning rate: 3.734E-05 | global batch size: 48 | lm loss: 6.493249E+00 | loss scale: 4096.0 | grad norm: 71580.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5273/ 159576 | consumed samples: 134976 | elapsed time per iteration (ms): 15402.4 | learning rate: 3.735E-05 | global batch size: 48 | lm loss: 6.376624E+00 | loss scale: 4096.0 | grad norm: 85075.910 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5274/ 159576 | consumed samples: 135024 | elapsed time per iteration (ms): 15424.2 | learning rate: 3.737E-05 | global batch size: 48 | lm loss: 6.441435E+00 | loss scale: 4096.0 | grad norm: 75286.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5275/ 159576 | consumed samples: 135072 | elapsed time per iteration (ms): 15616.5 | learning rate: 3.738E-05 | global batch size: 48 | lm loss: 6.428281E+00 | loss scale: 4096.0 | grad norm: 71317.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5276/ 159576 | consumed samples: 135120 | elapsed time per iteration (ms): 15383.8 | learning rate: 3.739E-05 | global batch size: 48 | lm loss: 6.324539E+00 | loss scale: 4096.0 | grad norm: 70509.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5277/ 159576 | consumed samples: 135168 | elapsed time per iteration (ms): 15404.4 | learning rate: 3.741E-05 | global batch size: 48 | lm loss: 6.396560E+00 | loss scale: 4096.0 | grad norm: 68223.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5278/ 159576 | consumed samples: 135216 | elapsed time per iteration (ms): 15464.0 | learning rate: 3.742E-05 | global batch size: 48 | lm loss: 6.403405E+00 | loss scale: 4096.0 | grad norm: 74828.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5279/ 159576 | consumed samples: 135264 | elapsed time per iteration (ms): 15572.0 | learning rate: 3.743E-05 | global batch size: 48 | lm loss: 6.340907E+00 | loss scale: 4096.0 | grad norm: 103719.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5280/ 159576 | consumed samples: 135312 | elapsed time per iteration (ms): 15390.1 | learning rate: 3.745E-05 | global batch size: 48 | lm loss: 6.465801E+00 | loss scale: 4096.0 | grad norm: 71954.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5281/ 159576 | consumed samples: 135360 | elapsed time per iteration (ms): 15379.3 | learning rate: 3.746E-05 | global batch size: 48 | lm loss: 6.481463E+00 | loss scale: 4096.0 | grad norm: 64156.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5282/ 159576 | consumed samples: 135408 | elapsed time per iteration (ms): 15880.0 | learning rate: 3.747E-05 | global batch size: 48 | lm loss: 6.324627E+00 | loss scale: 4096.0 | grad norm: 77974.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5283/ 159576 | consumed samples: 135456 | elapsed time per iteration (ms): 15461.2 | learning rate: 3.749E-05 | global batch size: 48 | lm loss: 6.278036E+00 | loss scale: 4096.0 | grad norm: 78417.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5284/ 159576 | consumed samples: 135504 | elapsed time per iteration (ms): 15434.3 | learning rate: 3.750E-05 | global batch size: 48 | lm loss: 6.470399E+00 | loss scale: 4096.0 | grad norm: 70677.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5285/ 159576 | consumed samples: 135552 | elapsed time per iteration (ms): 15453.3 | learning rate: 3.751E-05 | global batch size: 48 | lm loss: 6.465354E+00 | loss scale: 4096.0 | grad norm: 72699.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5286/ 159576 | consumed samples: 135600 | elapsed time per iteration (ms): 15799.4 | learning rate: 3.753E-05 | global batch size: 48 | lm loss: 6.366466E+00 | loss scale: 4096.0 | grad norm: 87890.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5287/ 159576 | consumed samples: 135648 | elapsed time per iteration (ms): 15462.6 | learning rate: 3.754E-05 | global batch size: 48 | lm loss: 6.450302E+00 | loss scale: 4096.0 | grad norm: 65500.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5288/ 159576 | consumed samples: 135696 | elapsed time per iteration (ms): 15449.3 | learning rate: 3.755E-05 | global batch size: 48 | lm loss: 6.211058E+00 | loss scale: 4096.0 | grad norm: 91309.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5289/ 159576 | consumed samples: 135744 | elapsed time per iteration (ms): 15440.0 | learning rate: 3.757E-05 | global batch size: 48 | lm loss: 6.439297E+00 | loss scale: 4096.0 | grad norm: 78139.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5290/ 159576 | consumed samples: 135792 | elapsed time per iteration (ms): 15759.6 | learning rate: 3.758E-05 | global batch size: 48 | lm loss: 6.295393E+00 | loss scale: 4096.0 | grad norm: 67343.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5291/ 159576 | consumed samples: 135840 | elapsed time per iteration (ms): 15513.6 | learning rate: 3.759E-05 | global batch size: 48 | lm loss: 6.403075E+00 | loss scale: 4096.0 | grad norm: 88227.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5292/ 159576 | consumed samples: 135888 | elapsed time per iteration (ms): 15421.3 | learning rate: 3.761E-05 | global batch size: 48 | lm loss: 6.414333E+00 | loss scale: 4096.0 | grad norm: 78788.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5293/ 159576 | consumed samples: 135936 | elapsed time per iteration (ms): 15345.3 | learning rate: 3.762E-05 | global batch size: 48 | lm loss: 6.292488E+00 | loss scale: 4096.0 | grad norm: 59708.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5294/ 159576 | consumed samples: 135984 | elapsed time per iteration (ms): 16027.7 | learning rate: 3.763E-05 | global batch size: 48 | lm loss: 6.385753E+00 | loss scale: 4096.0 | grad norm: 102775.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5295/ 159576 | consumed samples: 136032 | elapsed time per iteration (ms): 15461.5 | learning rate: 3.765E-05 | global batch size: 48 | lm loss: 6.324437E+00 | loss scale: 4096.0 | grad norm: 71697.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5296/ 159576 | consumed samples: 136080 | elapsed time per iteration (ms): 15433.9 | learning rate: 3.766E-05 | global batch size: 48 | lm loss: 6.384956E+00 | loss scale: 4096.0 | grad norm: 102953.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5297/ 159576 | consumed samples: 136128 | elapsed time per iteration (ms): 15429.7 | learning rate: 3.767E-05 | global batch size: 48 | lm loss: 6.436825E+00 | loss scale: 4096.0 | grad norm: 75031.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5298/ 159576 | consumed samples: 136176 | elapsed time per iteration (ms): 15818.4 | learning rate: 3.769E-05 | global batch size: 48 | lm loss: 6.482272E+00 | loss scale: 4096.0 | grad norm: 65276.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5299/ 159576 | consumed samples: 136224 | elapsed time per iteration (ms): 15441.5 | learning rate: 3.770E-05 | global batch size: 48 | lm loss: 6.589076E+00 | loss scale: 4096.0 | grad norm: 121561.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5300/ 159576 | consumed samples: 136272 | elapsed time per iteration (ms): 15422.2 | learning rate: 3.771E-05 | global batch size: 48 | lm loss: 6.405668E+00 | loss scale: 4096.0 | grad norm: 62093.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5301/ 159576 | consumed samples: 136320 | elapsed time per iteration (ms): 15355.0 | learning rate: 3.773E-05 | global batch size: 48 | lm loss: 6.390646E+00 | loss scale: 4096.0 | grad norm: 56038.998 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5302/ 159576 | consumed samples: 136368 | elapsed time per iteration (ms): 15565.3 | learning rate: 3.774E-05 | global batch size: 48 | lm loss: 6.410752E+00 | loss scale: 4096.0 | grad norm: 64581.105 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5303/ 159576 | consumed samples: 136416 | elapsed time per iteration (ms): 15422.3 | learning rate: 3.775E-05 | global batch size: 48 | lm loss: 6.448494E+00 | loss scale: 4096.0 | grad norm: 77740.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5304/ 159576 | consumed samples: 136464 | elapsed time per iteration (ms): 15454.6 | learning rate: 3.777E-05 | global batch size: 48 | lm loss: 6.436998E+00 | loss scale: 4096.0 | grad norm: 86587.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5305/ 159576 | consumed samples: 136512 | elapsed time per iteration (ms): 15410.7 | learning rate: 3.778E-05 | global batch size: 48 | lm loss: 6.360906E+00 | loss scale: 4096.0 | grad norm: 102483.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5306/ 159576 | consumed samples: 136560 | elapsed time per iteration (ms): 15590.5 | learning rate: 3.779E-05 | global batch size: 48 | lm loss: 6.449046E+00 | loss scale: 4096.0 | grad norm: 63898.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5307/ 159576 | consumed samples: 136608 | elapsed time per iteration (ms): 15506.8 | learning rate: 3.781E-05 | global batch size: 48 | lm loss: 6.467348E+00 | loss scale: 4096.0 | grad norm: 66863.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5308/ 159576 | consumed samples: 136656 | elapsed time per iteration (ms): 15351.0 | learning rate: 3.782E-05 | global batch size: 48 | lm loss: 6.301440E+00 | loss scale: 4096.0 | grad norm: 66038.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5309/ 159576 | consumed samples: 136704 | elapsed time per iteration (ms): 15547.1 | learning rate: 3.783E-05 | global batch size: 48 | lm loss: 6.314401E+00 | loss scale: 4096.0 | grad norm: 100622.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5310/ 159576 | consumed samples: 136752 | elapsed time per iteration (ms): 15714.1 | learning rate: 3.785E-05 | global batch size: 48 | lm loss: 6.474138E+00 | loss scale: 4096.0 | grad norm: 100713.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5311/ 159576 | consumed samples: 136800 | elapsed time per iteration (ms): 15441.4 | learning rate: 3.786E-05 | global batch size: 48 | lm loss: 6.429978E+00 | loss scale: 4096.0 | grad norm: 73118.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5312/ 159576 | consumed samples: 136848 | elapsed time per iteration (ms): 15448.2 | learning rate: 3.787E-05 | global batch size: 48 | lm loss: 6.322928E+00 | loss scale: 4096.0 | grad norm: 79244.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5313/ 159576 | consumed samples: 136896 | elapsed time per iteration (ms): 15801.3 | learning rate: 3.789E-05 | global batch size: 48 | lm loss: 6.536728E+00 | loss scale: 4096.0 | grad norm: 80004.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5314/ 159576 | consumed samples: 136944 | elapsed time per iteration (ms): 15420.7 | learning rate: 3.790E-05 | global batch size: 48 | lm loss: 6.358313E+00 | loss scale: 4096.0 | grad norm: 73656.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5315/ 159576 | consumed samples: 136992 | elapsed time per iteration (ms): 15430.5 | learning rate: 3.791E-05 | global batch size: 48 | lm loss: 6.285139E+00 | loss scale: 4096.0 | grad norm: 72555.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5316/ 159576 | consumed samples: 137040 | elapsed time per iteration (ms): 15418.3 | learning rate: 3.793E-05 | global batch size: 48 | lm loss: 6.355993E+00 | loss scale: 4096.0 | grad norm: 89604.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5317/ 159576 | consumed samples: 137088 | elapsed time per iteration (ms): 15767.6 | learning rate: 3.794E-05 | global batch size: 48 | lm loss: 6.370296E+00 | loss scale: 4096.0 | grad norm: 68760.061 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5318/ 159576 | consumed samples: 137136 | elapsed time per iteration (ms): 15469.0 | learning rate: 3.795E-05 | global batch size: 48 | lm loss: 6.401207E+00 | loss scale: 4096.0 | grad norm: 64825.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5319/ 159576 | consumed samples: 137184 | elapsed time per iteration (ms): 15469.4 | learning rate: 3.797E-05 | global batch size: 48 | lm loss: 6.433188E+00 | loss scale: 4096.0 | grad norm: 75954.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5320/ 159576 | consumed samples: 137232 | elapsed time per iteration (ms): 15484.0 | learning rate: 3.798E-05 | global batch size: 48 | lm loss: 6.422481E+00 | loss scale: 4096.0 | grad norm: 85143.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5321/ 159576 | consumed samples: 137280 | elapsed time per iteration (ms): 15773.2 | learning rate: 3.799E-05 | global batch size: 48 | lm loss: 6.394318E+00 | loss scale: 4096.0 | grad norm: 81431.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5322/ 159576 | consumed samples: 137328 | elapsed time per iteration (ms): 15339.5 | learning rate: 3.801E-05 | global batch size: 48 | lm loss: 6.498918E+00 | loss scale: 4096.0 | grad norm: 76418.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5323/ 159576 | consumed samples: 137376 | elapsed time per iteration (ms): 15420.7 | learning rate: 3.802E-05 | global batch size: 48 | lm loss: 6.518599E+00 | loss scale: 4096.0 | grad norm: 71705.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5324/ 159576 | consumed samples: 137424 | elapsed time per iteration (ms): 15420.3 | learning rate: 3.803E-05 | global batch size: 48 | lm loss: 6.429631E+00 | loss scale: 4096.0 | grad norm: 57358.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5325/ 159576 | consumed samples: 137472 | elapsed time per iteration (ms): 15903.1 | learning rate: 3.805E-05 | global batch size: 48 | lm loss: 6.407781E+00 | loss scale: 4096.0 | grad norm: 91506.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5326/ 159576 | consumed samples: 137520 | elapsed time per iteration (ms): 15425.4 | learning rate: 3.806E-05 | global batch size: 48 | lm loss: 6.399868E+00 | loss scale: 4096.0 | grad norm: 68843.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5327/ 159576 | consumed samples: 137568 | elapsed time per iteration (ms): 15444.3 | learning rate: 3.807E-05 | global batch size: 48 | lm loss: 6.412372E+00 | loss scale: 4096.0 | grad norm: 67149.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5328/ 159576 | consumed samples: 137616 | elapsed time per iteration (ms): 15406.6 | learning rate: 3.809E-05 | global batch size: 48 | lm loss: 6.430699E+00 | loss scale: 4096.0 | grad norm: 102742.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5329/ 159576 | consumed samples: 137664 | elapsed time per iteration (ms): 15722.7 | learning rate: 3.810E-05 | global batch size: 48 | lm loss: 6.415520E+00 | loss scale: 4096.0 | grad norm: 73301.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5330/ 159576 | consumed samples: 137712 | elapsed time per iteration (ms): 15405.0 | learning rate: 3.811E-05 | global batch size: 48 | lm loss: 6.359590E+00 | loss scale: 4096.0 | grad norm: 70222.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5331/ 159576 | consumed samples: 137760 | elapsed time per iteration (ms): 15374.6 | learning rate: 3.813E-05 | global batch size: 48 | lm loss: 6.443409E+00 | loss scale: 4096.0 | grad norm: 79619.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5332/ 159576 | consumed samples: 137808 | elapsed time per iteration (ms): 15404.3 | learning rate: 3.814E-05 | global batch size: 48 | lm loss: 6.412749E+00 | loss scale: 4096.0 | grad norm: 110889.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5333/ 159576 | consumed samples: 137856 | elapsed time per iteration (ms): 15590.4 | learning rate: 3.815E-05 | global batch size: 48 | lm loss: 6.492513E+00 | loss scale: 4096.0 | grad norm: 80255.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5334/ 159576 | consumed samples: 137904 | elapsed time per iteration (ms): 15436.5 | learning rate: 3.817E-05 | global batch size: 48 | lm loss: 6.400149E+00 | loss scale: 4096.0 | grad norm: 69554.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5335/ 159576 | consumed samples: 137952 | elapsed time per iteration (ms): 15422.0 | learning rate: 3.818E-05 | global batch size: 48 | lm loss: 6.473186E+00 | loss scale: 4096.0 | grad norm: 96185.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5336/ 159576 | consumed samples: 138000 | elapsed time per iteration (ms): 15442.7 | learning rate: 3.819E-05 | global batch size: 48 | lm loss: 6.552884E+00 | loss scale: 4096.0 | grad norm: 73254.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5337/ 159576 | consumed samples: 138048 | elapsed time per iteration (ms): 15634.6 | learning rate: 3.821E-05 | global batch size: 48 | lm loss: 6.365612E+00 | loss scale: 4096.0 | grad norm: 57539.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5338/ 159576 | consumed samples: 138096 | elapsed time per iteration (ms): 15386.8 | learning rate: 3.822E-05 | global batch size: 48 | lm loss: 6.445109E+00 | loss scale: 4096.0 | grad norm: 67382.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5339/ 159576 | consumed samples: 138144 | elapsed time per iteration (ms): 15470.1 | learning rate: 3.823E-05 | global batch size: 48 | lm loss: 6.353713E+00 | loss scale: 4096.0 | grad norm: 110272.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5340/ 159576 | consumed samples: 138192 | elapsed time per iteration (ms): 15791.0 | learning rate: 3.825E-05 | global batch size: 48 | lm loss: 6.413539E+00 | loss scale: 4096.0 | grad norm: 72349.998 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5341/ 159576 | consumed samples: 138240 | elapsed time per iteration (ms): 15411.4 | learning rate: 3.826E-05 | global batch size: 48 | lm loss: 6.347322E+00 | loss scale: 4096.0 | grad norm: 61859.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5342/ 159576 | consumed samples: 138288 | elapsed time per iteration (ms): 15471.9 | learning rate: 3.827E-05 | global batch size: 48 | lm loss: 6.298682E+00 | loss scale: 4096.0 | grad norm: 78125.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5343/ 159576 | consumed samples: 138336 | elapsed time per iteration (ms): 15450.5 | learning rate: 3.829E-05 | global batch size: 48 | lm loss: 6.346509E+00 | loss scale: 4096.0 | grad norm: 76921.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5344/ 159576 | consumed samples: 138384 | elapsed time per iteration (ms): 15797.4 | learning rate: 3.830E-05 | global batch size: 48 | lm loss: 6.464560E+00 | loss scale: 4096.0 | grad norm: 73833.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5345/ 159576 | consumed samples: 138432 | elapsed time per iteration (ms): 15447.3 | learning rate: 3.831E-05 | global batch size: 48 | lm loss: 6.491942E+00 | loss scale: 4096.0 | grad norm: 58609.094 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5346/ 159576 | consumed samples: 138480 | elapsed time per iteration (ms): 15470.6 | learning rate: 3.833E-05 | global batch size: 48 | lm loss: 6.408776E+00 | loss scale: 4096.0 | grad norm: 61084.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5347/ 159576 | consumed samples: 138528 | elapsed time per iteration (ms): 15595.7 | learning rate: 3.834E-05 | global batch size: 48 | lm loss: 6.317072E+00 | loss scale: 4096.0 | grad norm: 79107.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5348/ 159576 | consumed samples: 138576 | elapsed time per iteration (ms): 15857.5 | learning rate: 3.835E-05 | global batch size: 48 | lm loss: 6.342214E+00 | loss scale: 4096.0 | grad norm: 82396.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5349/ 159576 | consumed samples: 138624 | elapsed time per iteration (ms): 15501.3 | learning rate: 3.837E-05 | global batch size: 48 | lm loss: 6.416060E+00 | loss scale: 4096.0 | grad norm: 58909.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5350/ 159576 | consumed samples: 138672 | elapsed time per iteration (ms): 15334.9 | learning rate: 3.838E-05 | global batch size: 48 | lm loss: 6.348287E+00 | loss scale: 4096.0 | grad norm: 54069.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5351/ 159576 | consumed samples: 138720 | elapsed time per iteration (ms): 15454.2 | learning rate: 3.839E-05 | global batch size: 48 | lm loss: 6.456007E+00 | loss scale: 4096.0 | grad norm: 61307.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5352/ 159576 | consumed samples: 138768 | elapsed time per iteration (ms): 15972.1 | learning rate: 3.841E-05 | global batch size: 48 | lm loss: 6.276731E+00 | loss scale: 4096.0 | grad norm: 62789.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5353/ 159576 | consumed samples: 138816 | elapsed time per iteration (ms): 15447.0 | learning rate: 3.842E-05 | global batch size: 48 | lm loss: 6.443192E+00 | loss scale: 4096.0 | grad norm: 75454.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5354/ 159576 | consumed samples: 138864 | elapsed time per iteration (ms): 15426.1 | learning rate: 3.843E-05 | global batch size: 48 | lm loss: 6.301665E+00 | loss scale: 4096.0 | grad norm: 66381.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5355/ 159576 | consumed samples: 138912 | elapsed time per iteration (ms): 15465.4 | learning rate: 3.845E-05 | global batch size: 48 | lm loss: 6.453572E+00 | loss scale: 4096.0 | grad norm: 63236.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5356/ 159576 | consumed samples: 138960 | elapsed time per iteration (ms): 15595.7 | learning rate: 3.846E-05 | global batch size: 48 | lm loss: 6.391494E+00 | loss scale: 4096.0 | grad norm: 78457.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5357/ 159576 | consumed samples: 139008 | elapsed time per iteration (ms): 15508.4 | learning rate: 3.847E-05 | global batch size: 48 | lm loss: 6.379974E+00 | loss scale: 4096.0 | grad norm: 85282.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5358/ 159576 | consumed samples: 139056 | elapsed time per iteration (ms): 15495.7 | learning rate: 3.849E-05 | global batch size: 48 | lm loss: 6.517261E+00 | loss scale: 4096.0 | grad norm: 75329.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5359/ 159576 | consumed samples: 139104 | elapsed time per iteration (ms): 15455.1 | learning rate: 3.850E-05 | global batch size: 48 | lm loss: 6.311386E+00 | loss scale: 4096.0 | grad norm: 74599.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5360/ 159576 | consumed samples: 139152 | elapsed time per iteration (ms): 15693.4 | learning rate: 3.851E-05 | global batch size: 48 | lm loss: 6.481428E+00 | loss scale: 4096.0 | grad norm: 77215.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5361/ 159576 | consumed samples: 139200 | elapsed time per iteration (ms): 15475.6 | learning rate: 3.853E-05 | global batch size: 48 | lm loss: 6.331719E+00 | loss scale: 4096.0 | grad norm: 60279.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5362/ 159576 | consumed samples: 139248 | elapsed time per iteration (ms): 15551.6 | learning rate: 3.854E-05 | global batch size: 48 | lm loss: 6.506707E+00 | loss scale: 4096.0 | grad norm: 57442.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5363/ 159576 | consumed samples: 139296 | elapsed time per iteration (ms): 15525.0 | learning rate: 3.855E-05 | global batch size: 48 | lm loss: 6.283090E+00 | loss scale: 4096.0 | grad norm: 69167.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5364/ 159576 | consumed samples: 139344 | elapsed time per iteration (ms): 15703.9 | learning rate: 3.857E-05 | global batch size: 48 | lm loss: 6.344968E+00 | loss scale: 4096.0 | grad norm: 66351.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5365/ 159576 | consumed samples: 139392 | elapsed time per iteration (ms): 15511.9 | learning rate: 3.858E-05 | global batch size: 48 | lm loss: 6.402239E+00 | loss scale: 4096.0 | grad norm: 69893.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5366/ 159576 | consumed samples: 139440 | elapsed time per iteration (ms): 15507.6 | learning rate: 3.859E-05 | global batch size: 48 | lm loss: 6.510591E+00 | loss scale: 4096.0 | grad norm: 73294.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5367/ 159576 | consumed samples: 139488 | elapsed time per iteration (ms): 15841.0 | learning rate: 3.861E-05 | global batch size: 48 | lm loss: 6.292207E+00 | loss scale: 4096.0 | grad norm: 69220.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5368/ 159576 | consumed samples: 139536 | elapsed time per iteration (ms): 15748.2 | learning rate: 3.862E-05 | global batch size: 48 | lm loss: 6.492587E+00 | loss scale: 4096.0 | grad norm: 78294.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5369/ 159576 | consumed samples: 139584 | elapsed time per iteration (ms): 15492.3 | learning rate: 3.863E-05 | global batch size: 48 | lm loss: 6.493845E+00 | loss scale: 4096.0 | grad norm: 94517.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5370/ 159576 | consumed samples: 139632 | elapsed time per iteration (ms): 15493.8 | learning rate: 3.864E-05 | global batch size: 48 | lm loss: 6.430061E+00 | loss scale: 4096.0 | grad norm: 77523.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5371/ 159576 | consumed samples: 139680 | elapsed time per iteration (ms): 15870.2 | learning rate: 3.866E-05 | global batch size: 48 | lm loss: 6.411311E+00 | loss scale: 4096.0 | grad norm: 69582.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5372/ 159576 | consumed samples: 139728 | elapsed time per iteration (ms): 15517.9 | learning rate: 3.867E-05 | global batch size: 48 | lm loss: 6.515477E+00 | loss scale: 4096.0 | grad norm: 75626.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5373/ 159576 | consumed samples: 139776 | elapsed time per iteration (ms): 15491.8 | learning rate: 3.868E-05 | global batch size: 48 | lm loss: 6.453342E+00 | loss scale: 4096.0 | grad norm: 69940.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5374/ 159576 | consumed samples: 139824 | elapsed time per iteration (ms): 15511.6 | learning rate: 3.870E-05 | global batch size: 48 | lm loss: 6.378087E+00 | loss scale: 4096.0 | grad norm: 70420.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5375/ 159576 | consumed samples: 139872 | elapsed time per iteration (ms): 15836.7 | learning rate: 3.871E-05 | global batch size: 48 | lm loss: 6.371119E+00 | loss scale: 4096.0 | grad norm: 56046.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5376/ 159576 | consumed samples: 139920 | elapsed time per iteration (ms): 15468.7 | learning rate: 3.872E-05 | global batch size: 48 | lm loss: 6.480386E+00 | loss scale: 4096.0 | grad norm: 67254.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5377/ 159576 | consumed samples: 139968 | elapsed time per iteration (ms): 15505.8 | learning rate: 3.874E-05 | global batch size: 48 | lm loss: 6.445705E+00 | loss scale: 4096.0 | grad norm: 58120.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5378/ 159576 | consumed samples: 140016 | elapsed time per iteration (ms): 15512.2 | learning rate: 3.875E-05 | global batch size: 48 | lm loss: 6.383876E+00 | loss scale: 4096.0 | grad norm: 63811.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5379/ 159576 | consumed samples: 140064 | elapsed time per iteration (ms): 15885.3 | learning rate: 3.876E-05 | global batch size: 48 | lm loss: 6.430426E+00 | loss scale: 4096.0 | grad norm: 71627.105 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5380/ 159576 | consumed samples: 140112 | elapsed time per iteration (ms): 15514.4 | learning rate: 3.878E-05 | global batch size: 48 | lm loss: 6.352599E+00 | loss scale: 4096.0 | grad norm: 55768.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5381/ 159576 | consumed samples: 140160 | elapsed time per iteration (ms): 15536.5 | learning rate: 3.879E-05 | global batch size: 48 | lm loss: 6.462265E+00 | loss scale: 4096.0 | grad norm: 76307.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5382/ 159576 | consumed samples: 140208 | elapsed time per iteration (ms): 15499.8 | learning rate: 3.880E-05 | global batch size: 48 | lm loss: 6.439154E+00 | loss scale: 4096.0 | grad norm: 97619.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5383/ 159576 | consumed samples: 140256 | elapsed time per iteration (ms): 15693.9 | learning rate: 3.882E-05 | global batch size: 48 | lm loss: 6.327425E+00 | loss scale: 4096.0 | grad norm: 69803.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5384/ 159576 | consumed samples: 140304 | elapsed time per iteration (ms): 15550.5 | learning rate: 3.883E-05 | global batch size: 48 | lm loss: 6.391693E+00 | loss scale: 4096.0 | grad norm: 66211.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5385/ 159576 | consumed samples: 140352 | elapsed time per iteration (ms): 15520.0 | learning rate: 3.884E-05 | global batch size: 48 | lm loss: 6.323473E+00 | loss scale: 4096.0 | grad norm: 68034.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5386/ 159576 | consumed samples: 140400 | elapsed time per iteration (ms): 15545.0 | learning rate: 3.886E-05 | global batch size: 48 | lm loss: 6.299393E+00 | loss scale: 4096.0 | grad norm: 85492.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5387/ 159576 | consumed samples: 140448 | elapsed time per iteration (ms): 15684.9 | learning rate: 3.887E-05 | global batch size: 48 | lm loss: 6.374225E+00 | loss scale: 4096.0 | grad norm: 72949.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5388/ 159576 | consumed samples: 140496 | elapsed time per iteration (ms): 15553.2 | learning rate: 3.888E-05 | global batch size: 48 | lm loss: 6.446224E+00 | loss scale: 4096.0 | grad norm: 83315.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5389/ 159576 | consumed samples: 140544 | elapsed time per iteration (ms): 15520.1 | learning rate: 3.890E-05 | global batch size: 48 | lm loss: 6.336344E+00 | loss scale: 4096.0 | grad norm: 60566.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5390/ 159576 | consumed samples: 140592 | elapsed time per iteration (ms): 15438.2 | learning rate: 3.891E-05 | global batch size: 48 | lm loss: 6.437949E+00 | loss scale: 4096.0 | grad norm: 93800.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5391/ 159576 | consumed samples: 140640 | elapsed time per iteration (ms): 15842.4 | learning rate: 3.892E-05 | global batch size: 48 | lm loss: 6.445059E+00 | loss scale: 4096.0 | grad norm: 67207.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5392/ 159576 | consumed samples: 140688 | elapsed time per iteration (ms): 15543.4 | learning rate: 3.894E-05 | global batch size: 48 | lm loss: 6.340952E+00 | loss scale: 4096.0 | grad norm: 92289.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5393/ 159576 | consumed samples: 140736 | elapsed time per iteration (ms): 15518.9 | learning rate: 3.895E-05 | global batch size: 48 | lm loss: 6.416577E+00 | loss scale: 4096.0 | grad norm: 84099.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5394/ 159576 | consumed samples: 140784 | elapsed time per iteration (ms): 15997.3 | learning rate: 3.896E-05 | global batch size: 48 | lm loss: 6.439622E+00 | loss scale: 4096.0 | grad norm: 54809.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5395/ 159576 | consumed samples: 140832 | elapsed time per iteration (ms): 15450.3 | learning rate: 3.898E-05 | global batch size: 48 | lm loss: 6.441430E+00 | loss scale: 4096.0 | grad norm: 63144.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5396/ 159576 | consumed samples: 140880 | elapsed time per iteration (ms): 15568.2 | learning rate: 3.899E-05 | global batch size: 48 | lm loss: 6.424047E+00 | loss scale: 4096.0 | grad norm: 106261.057 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5397/ 159576 | consumed samples: 140928 | elapsed time per iteration (ms): 15464.4 | learning rate: 3.900E-05 | global batch size: 48 | lm loss: 6.325677E+00 | loss scale: 4096.0 | grad norm: 64383.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5398/ 159576 | consumed samples: 140976 | elapsed time per iteration (ms): 15883.9 | learning rate: 3.902E-05 | global batch size: 48 | lm loss: 6.582463E+00 | loss scale: 4096.0 | grad norm: 66662.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5399/ 159576 | consumed samples: 141024 | elapsed time per iteration (ms): 15497.5 | learning rate: 3.903E-05 | global batch size: 48 | lm loss: 6.498641E+00 | loss scale: 4096.0 | grad norm: 59391.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5400/ 159576 | consumed samples: 141072 | elapsed time per iteration (ms): 15569.9 | learning rate: 3.904E-05 | global batch size: 48 | lm loss: 6.283938E+00 | loss scale: 4096.0 | grad norm: 64487.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5401/ 159576 | consumed samples: 141120 | elapsed time per iteration (ms): 15526.8 | learning rate: 3.906E-05 | global batch size: 48 | lm loss: 6.336715E+00 | loss scale: 4096.0 | grad norm: 57781.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5402/ 159576 | consumed samples: 141168 | elapsed time per iteration (ms): 15981.6 | learning rate: 3.907E-05 | global batch size: 48 | lm loss: 6.293415E+00 | loss scale: 4096.0 | grad norm: 92738.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5403/ 159576 | consumed samples: 141216 | elapsed time per iteration (ms): 15632.0 | learning rate: 3.908E-05 | global batch size: 48 | lm loss: 6.294649E+00 | loss scale: 4096.0 | grad norm: 62910.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5404/ 159576 | consumed samples: 141264 | elapsed time per iteration (ms): 15497.6 | learning rate: 3.910E-05 | global batch size: 48 | lm loss: 6.331801E+00 | loss scale: 4096.0 | grad norm: 64648.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5405/ 159576 | consumed samples: 141312 | elapsed time per iteration (ms): 15498.1 | learning rate: 3.911E-05 | global batch size: 48 | lm loss: 6.406822E+00 | loss scale: 4096.0 | grad norm: 71416.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5406/ 159576 | consumed samples: 141360 | elapsed time per iteration (ms): 15867.4 | learning rate: 3.912E-05 | global batch size: 48 | lm loss: 6.404875E+00 | loss scale: 4096.0 | grad norm: 56955.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5407/ 159576 | consumed samples: 141408 | elapsed time per iteration (ms): 15506.2 | learning rate: 3.914E-05 | global batch size: 48 | lm loss: 6.428100E+00 | loss scale: 4096.0 | grad norm: 65410.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5408/ 159576 | consumed samples: 141456 | elapsed time per iteration (ms): 15573.9 | learning rate: 3.915E-05 | global batch size: 48 | lm loss: 6.352518E+00 | loss scale: 4096.0 | grad norm: 57463.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5409/ 159576 | consumed samples: 141504 | elapsed time per iteration (ms): 15570.8 | learning rate: 3.916E-05 | global batch size: 48 | lm loss: 6.276915E+00 | loss scale: 4096.0 | grad norm: 56808.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5410/ 159576 | consumed samples: 141552 | elapsed time per iteration (ms): 15647.9 | learning rate: 3.918E-05 | global batch size: 48 | lm loss: 6.388402E+00 | loss scale: 4096.0 | grad norm: 55831.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5411/ 159576 | consumed samples: 141600 | elapsed time per iteration (ms): 15527.8 | learning rate: 3.919E-05 | global batch size: 48 | lm loss: 6.359324E+00 | loss scale: 4096.0 | grad norm: 58176.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5412/ 159576 | consumed samples: 141648 | elapsed time per iteration (ms): 15485.9 | learning rate: 3.920E-05 | global batch size: 48 | lm loss: 6.410316E+00 | loss scale: 4096.0 | grad norm: 58797.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5413/ 159576 | consumed samples: 141696 | elapsed time per iteration (ms): 15570.6 | learning rate: 3.922E-05 | global batch size: 48 | lm loss: 6.487602E+00 | loss scale: 4096.0 | grad norm: 54779.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5414/ 159576 | consumed samples: 141744 | elapsed time per iteration (ms): 15692.4 | learning rate: 3.923E-05 | global batch size: 48 | lm loss: 6.538764E+00 | loss scale: 4096.0 | grad norm: 56952.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5415/ 159576 | consumed samples: 141808 | elapsed time per iteration (ms): 16423.4 | learning rate: 3.925E-05 | global batch size: 64 | lm loss: 6.468464E+00 | loss scale: 4096.0 | grad norm: 47962.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5416/ 159576 | consumed samples: 141872 | elapsed time per iteration (ms): 16486.4 | learning rate: 3.927E-05 | global batch size: 64 | lm loss: 6.358836E+00 | loss scale: 4096.0 | grad norm: 79746.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5417/ 159576 | consumed samples: 141936 | elapsed time per iteration (ms): 16837.9 | learning rate: 3.928E-05 | global batch size: 64 | lm loss: 6.458796E+00 | loss scale: 4096.0 | grad norm: 72485.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5418/ 159576 | consumed samples: 142000 | elapsed time per iteration (ms): 16282.1 | learning rate: 3.930E-05 | global batch size: 64 | lm loss: 6.325031E+00 | loss scale: 4096.0 | grad norm: 50657.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5419/ 159576 | consumed samples: 142064 | elapsed time per iteration (ms): 16473.5 | learning rate: 3.932E-05 | global batch size: 64 | lm loss: 6.393603E+00 | loss scale: 4096.0 | grad norm: 53317.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5420/ 159576 | consumed samples: 142128 | elapsed time per iteration (ms): 16358.3 | learning rate: 3.934E-05 | global batch size: 64 | lm loss: 6.505975E+00 | loss scale: 4096.0 | grad norm: 76759.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5421/ 159576 | consumed samples: 142192 | elapsed time per iteration (ms): 16646.9 | learning rate: 3.936E-05 | global batch size: 64 | lm loss: 6.377459E+00 | loss scale: 4096.0 | grad norm: 61658.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5422/ 159576 | consumed samples: 142256 | elapsed time per iteration (ms): 16480.4 | learning rate: 3.937E-05 | global batch size: 64 | lm loss: 6.350579E+00 | loss scale: 4096.0 | grad norm: 61672.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5423/ 159576 | consumed samples: 142320 | elapsed time per iteration (ms): 16500.8 | learning rate: 3.939E-05 | global batch size: 64 | lm loss: 6.359305E+00 | loss scale: 4096.0 | grad norm: 71934.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5424/ 159576 | consumed samples: 142384 | elapsed time per iteration (ms): 16400.7 | learning rate: 3.941E-05 | global batch size: 64 | lm loss: 6.515474E+00 | loss scale: 4096.0 | grad norm: 62262.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5425/ 159576 | consumed samples: 142448 | elapsed time per iteration (ms): 16686.7 | learning rate: 3.943E-05 | global batch size: 64 | lm loss: 6.377324E+00 | loss scale: 4096.0 | grad norm: 66128.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5426/ 159576 | consumed samples: 142512 | elapsed time per iteration (ms): 16346.9 | learning rate: 3.944E-05 | global batch size: 64 | lm loss: 6.394655E+00 | loss scale: 4096.0 | grad norm: 64276.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5427/ 159576 | consumed samples: 142576 | elapsed time per iteration (ms): 16454.0 | learning rate: 3.946E-05 | global batch size: 64 | lm loss: 6.417256E+00 | loss scale: 4096.0 | grad norm: 55916.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5428/ 159576 | consumed samples: 142640 | elapsed time per iteration (ms): 16713.8 | learning rate: 3.948E-05 | global batch size: 64 | lm loss: 6.314127E+00 | loss scale: 4096.0 | grad norm: 65443.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5429/ 159576 | consumed samples: 142704 | elapsed time per iteration (ms): 16492.7 | learning rate: 3.950E-05 | global batch size: 64 | lm loss: 6.349669E+00 | loss scale: 4096.0 | grad norm: 64819.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5430/ 159576 | consumed samples: 142768 | elapsed time per iteration (ms): 16430.1 | learning rate: 3.951E-05 | global batch size: 64 | lm loss: 6.406096E+00 | loss scale: 4096.0 | grad norm: 72027.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5431/ 159576 | consumed samples: 142832 | elapsed time per iteration (ms): 16452.9 | learning rate: 3.953E-05 | global batch size: 64 | lm loss: 6.422045E+00 | loss scale: 4096.0 | grad norm: 59470.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5432/ 159576 | consumed samples: 142896 | elapsed time per iteration (ms): 16574.0 | learning rate: 3.955E-05 | global batch size: 64 | lm loss: 6.384964E+00 | loss scale: 4096.0 | grad norm: 59229.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5433/ 159576 | consumed samples: 142960 | elapsed time per iteration (ms): 16448.4 | learning rate: 3.957E-05 | global batch size: 64 | lm loss: 6.388242E+00 | loss scale: 4096.0 | grad norm: 51139.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5434/ 159576 | consumed samples: 143024 | elapsed time per iteration (ms): 16378.2 | learning rate: 3.959E-05 | global batch size: 64 | lm loss: 6.422913E+00 | loss scale: 4096.0 | grad norm: 55548.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5435/ 159576 | consumed samples: 143088 | elapsed time per iteration (ms): 16838.8 | learning rate: 3.960E-05 | global batch size: 64 | lm loss: 6.399693E+00 | loss scale: 4096.0 | grad norm: 87728.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5436/ 159576 | consumed samples: 143152 | elapsed time per iteration (ms): 16458.9 | learning rate: 3.962E-05 | global batch size: 64 | lm loss: 6.291359E+00 | loss scale: 4096.0 | grad norm: 65955.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5437/ 159576 | consumed samples: 143216 | elapsed time per iteration (ms): 16425.2 | learning rate: 3.964E-05 | global batch size: 64 | lm loss: 6.367932E+00 | loss scale: 4096.0 | grad norm: 63150.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5438/ 159576 | consumed samples: 143280 | elapsed time per iteration (ms): 16418.8 | learning rate: 3.966E-05 | global batch size: 64 | lm loss: 6.365756E+00 | loss scale: 4096.0 | grad norm: 57427.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5439/ 159576 | consumed samples: 143344 | elapsed time per iteration (ms): 16802.3 | learning rate: 3.967E-05 | global batch size: 64 | lm loss: 6.415596E+00 | loss scale: 4096.0 | grad norm: 61605.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5440/ 159576 | consumed samples: 143408 | elapsed time per iteration (ms): 16516.9 | learning rate: 3.969E-05 | global batch size: 64 | lm loss: 6.414165E+00 | loss scale: 4096.0 | grad norm: 64434.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5441/ 159576 | consumed samples: 143472 | elapsed time per iteration (ms): 16398.0 | learning rate: 3.971E-05 | global batch size: 64 | lm loss: 6.425170E+00 | loss scale: 4096.0 | grad norm: 63830.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5442/ 159576 | consumed samples: 143536 | elapsed time per iteration (ms): 16330.0 | learning rate: 3.973E-05 | global batch size: 64 | lm loss: 6.420317E+00 | loss scale: 4096.0 | grad norm: 80818.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5443/ 159576 | consumed samples: 143600 | elapsed time per iteration (ms): 16646.2 | learning rate: 3.975E-05 | global batch size: 64 | lm loss: 6.404300E+00 | loss scale: 4096.0 | grad norm: 66058.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5444/ 159576 | consumed samples: 143664 | elapsed time per iteration (ms): 16389.9 | learning rate: 3.976E-05 | global batch size: 64 | lm loss: 6.307170E+00 | loss scale: 4096.0 | grad norm: 64553.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5445/ 159576 | consumed samples: 143728 | elapsed time per iteration (ms): 16425.8 | learning rate: 3.978E-05 | global batch size: 64 | lm loss: 6.474117E+00 | loss scale: 4096.0 | grad norm: 54414.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5446/ 159576 | consumed samples: 143792 | elapsed time per iteration (ms): 16855.6 | learning rate: 3.980E-05 | global batch size: 64 | lm loss: 6.329272E+00 | loss scale: 4096.0 | grad norm: 67896.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5447/ 159576 | consumed samples: 143856 | elapsed time per iteration (ms): 16363.1 | learning rate: 3.982E-05 | global batch size: 64 | lm loss: 6.485427E+00 | loss scale: 4096.0 | grad norm: 55200.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5448/ 159576 | consumed samples: 143920 | elapsed time per iteration (ms): 16446.4 | learning rate: 3.983E-05 | global batch size: 64 | lm loss: 6.474103E+00 | loss scale: 4096.0 | grad norm: 58759.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5449/ 159576 | consumed samples: 143984 | elapsed time per iteration (ms): 16365.5 | learning rate: 3.985E-05 | global batch size: 64 | lm loss: 6.386650E+00 | loss scale: 4096.0 | grad norm: 69075.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5450/ 159576 | consumed samples: 144048 | elapsed time per iteration (ms): 16855.4 | learning rate: 3.987E-05 | global batch size: 64 | lm loss: 6.407839E+00 | loss scale: 4096.0 | grad norm: 76751.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5451/ 159576 | consumed samples: 144112 | elapsed time per iteration (ms): 16481.2 | learning rate: 3.989E-05 | global batch size: 64 | lm loss: 6.437217E+00 | loss scale: 4096.0 | grad norm: 60762.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5452/ 159576 | consumed samples: 144176 | elapsed time per iteration (ms): 16387.3 | learning rate: 3.991E-05 | global batch size: 64 | lm loss: 6.391966E+00 | loss scale: 4096.0 | grad norm: 57835.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5453/ 159576 | consumed samples: 144240 | elapsed time per iteration (ms): 16456.9 | learning rate: 3.992E-05 | global batch size: 64 | lm loss: 6.407461E+00 | loss scale: 4096.0 | grad norm: 56276.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5454/ 159576 | consumed samples: 144304 | elapsed time per iteration (ms): 16533.3 | learning rate: 3.994E-05 | global batch size: 64 | lm loss: 6.319425E+00 | loss scale: 4096.0 | grad norm: 66856.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5455/ 159576 | consumed samples: 144368 | elapsed time per iteration (ms): 16417.1 | learning rate: 3.996E-05 | global batch size: 64 | lm loss: 6.377168E+00 | loss scale: 4096.0 | grad norm: 53863.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5456/ 159576 | consumed samples: 144432 | elapsed time per iteration (ms): 16422.1 | learning rate: 3.998E-05 | global batch size: 64 | lm loss: 6.368913E+00 | loss scale: 4096.0 | grad norm: 63261.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5457/ 159576 | consumed samples: 144496 | elapsed time per iteration (ms): 16738.2 | learning rate: 3.999E-05 | global batch size: 64 | lm loss: 6.264383E+00 | loss scale: 4096.0 | grad norm: 64656.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5458/ 159576 | consumed samples: 144560 | elapsed time per iteration (ms): 16315.9 | learning rate: 4.001E-05 | global batch size: 64 | lm loss: 6.410008E+00 | loss scale: 4096.0 | grad norm: 82472.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5459/ 159576 | consumed samples: 144624 | elapsed time per iteration (ms): 16385.7 | learning rate: 4.003E-05 | global batch size: 64 | lm loss: 6.419100E+00 | loss scale: 4096.0 | grad norm: 81581.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5460/ 159576 | consumed samples: 144688 | elapsed time per iteration (ms): 16422.6 | learning rate: 4.005E-05 | global batch size: 64 | lm loss: 6.374327E+00 | loss scale: 4096.0 | grad norm: 77883.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5461/ 159576 | consumed samples: 144752 | elapsed time per iteration (ms): 16514.0 | learning rate: 4.007E-05 | global batch size: 64 | lm loss: 6.323710E+00 | loss scale: 4096.0 | grad norm: 59535.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5462/ 159576 | consumed samples: 144816 | elapsed time per iteration (ms): 16520.4 | learning rate: 4.008E-05 | global batch size: 64 | lm loss: 6.325150E+00 | loss scale: 4096.0 | grad norm: 54807.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5463/ 159576 | consumed samples: 144880 | elapsed time per iteration (ms): 16362.9 | learning rate: 4.010E-05 | global batch size: 64 | lm loss: 6.461391E+00 | loss scale: 4096.0 | grad norm: 74839.084 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5464/ 159576 | consumed samples: 144944 | elapsed time per iteration (ms): 16408.3 | learning rate: 4.012E-05 | global batch size: 64 | lm loss: 6.392217E+00 | loss scale: 4096.0 | grad norm: 61727.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5465/ 159576 | consumed samples: 145008 | elapsed time per iteration (ms): 16556.8 | learning rate: 4.014E-05 | global batch size: 64 | lm loss: 6.349445E+00 | loss scale: 4096.0 | grad norm: 90938.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5466/ 159576 | consumed samples: 145072 | elapsed time per iteration (ms): 16389.1 | learning rate: 4.015E-05 | global batch size: 64 | lm loss: 6.314983E+00 | loss scale: 4096.0 | grad norm: 62408.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5467/ 159576 | consumed samples: 145136 | elapsed time per iteration (ms): 16364.1 | learning rate: 4.017E-05 | global batch size: 64 | lm loss: 6.412921E+00 | loss scale: 4096.0 | grad norm: 82535.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5468/ 159576 | consumed samples: 145200 | elapsed time per iteration (ms): 16712.9 | learning rate: 4.019E-05 | global batch size: 64 | lm loss: 6.508467E+00 | loss scale: 4096.0 | grad norm: 53388.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5469/ 159576 | consumed samples: 145264 | elapsed time per iteration (ms): 16357.7 | learning rate: 4.021E-05 | global batch size: 64 | lm loss: 6.367021E+00 | loss scale: 4096.0 | grad norm: 88053.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5470/ 159576 | consumed samples: 145328 | elapsed time per iteration (ms): 16424.7 | learning rate: 4.022E-05 | global batch size: 64 | lm loss: 6.396588E+00 | loss scale: 4096.0 | grad norm: 83281.076 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5471/ 159576 | consumed samples: 145392 | elapsed time per iteration (ms): 16363.6 | learning rate: 4.024E-05 | global batch size: 64 | lm loss: 6.387273E+00 | loss scale: 4096.0 | grad norm: 56875.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5472/ 159576 | consumed samples: 145456 | elapsed time per iteration (ms): 16523.2 | learning rate: 4.026E-05 | global batch size: 64 | lm loss: 6.456463E+00 | loss scale: 4096.0 | grad norm: 60270.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5473/ 159576 | consumed samples: 145520 | elapsed time per iteration (ms): 16398.7 | learning rate: 4.028E-05 | global batch size: 64 | lm loss: 6.460003E+00 | loss scale: 4096.0 | grad norm: 61151.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5474/ 159576 | consumed samples: 145584 | elapsed time per iteration (ms): 16345.5 | learning rate: 4.030E-05 | global batch size: 64 | lm loss: 6.443559E+00 | loss scale: 4096.0 | grad norm: 83130.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5475/ 159576 | consumed samples: 145648 | elapsed time per iteration (ms): 16591.9 | learning rate: 4.031E-05 | global batch size: 64 | lm loss: 6.454519E+00 | loss scale: 4096.0 | grad norm: 69198.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5476/ 159576 | consumed samples: 145712 | elapsed time per iteration (ms): 16643.0 | learning rate: 4.033E-05 | global batch size: 64 | lm loss: 6.424469E+00 | loss scale: 4096.0 | grad norm: 57626.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5477/ 159576 | consumed samples: 145776 | elapsed time per iteration (ms): 16362.1 | learning rate: 4.035E-05 | global batch size: 64 | lm loss: 6.285227E+00 | loss scale: 4096.0 | grad norm: 87864.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5478/ 159576 | consumed samples: 145840 | elapsed time per iteration (ms): 16435.9 | learning rate: 4.037E-05 | global batch size: 64 | lm loss: 6.372074E+00 | loss scale: 4096.0 | grad norm: 67542.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5479/ 159576 | consumed samples: 145904 | elapsed time per iteration (ms): 16597.3 | learning rate: 4.038E-05 | global batch size: 64 | lm loss: 6.438199E+00 | loss scale: 4096.0 | grad norm: 74119.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5480/ 159576 | consumed samples: 145968 | elapsed time per iteration (ms): 16483.8 | learning rate: 4.040E-05 | global batch size: 64 | lm loss: 6.487626E+00 | loss scale: 4096.0 | grad norm: 68136.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 00:07:47] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-25 00:07:47] PULSE: tr8-104B is running for 18:15:36 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 5481/ 159576 | consumed samples: 146032 | elapsed time per iteration (ms): 16373.0 | learning rate: 4.042E-05 | global batch size: 64 | lm loss: 6.280901E+00 | loss scale: 4096.0 | grad norm: 89214.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5482/ 159576 | consumed samples: 146096 | elapsed time per iteration (ms): 16391.1 | learning rate: 4.044E-05 | global batch size: 64 | lm loss: 6.407492E+00 | loss scale: 4096.0 | grad norm: 71190.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5483/ 159576 | consumed samples: 146160 | elapsed time per iteration (ms): 16510.6 | learning rate: 4.046E-05 | global batch size: 64 | lm loss: 6.338043E+00 | loss scale: 4096.0 | grad norm: 80052.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5484/ 159576 | consumed samples: 146224 | elapsed time per iteration (ms): 16428.2 | learning rate: 4.047E-05 | global batch size: 64 | lm loss: 6.381162E+00 | loss scale: 4096.0 | grad norm: 66785.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5485/ 159576 | consumed samples: 146288 | elapsed time per iteration (ms): 16390.1 | learning rate: 4.049E-05 | global batch size: 64 | lm loss: 6.377982E+00 | loss scale: 4096.0 | grad norm: 73739.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5486/ 159576 | consumed samples: 146352 | elapsed time per iteration (ms): 16772.0 | learning rate: 4.051E-05 | global batch size: 64 | lm loss: 6.417017E+00 | loss scale: 4096.0 | grad norm: 101012.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5487/ 159576 | consumed samples: 146416 | elapsed time per iteration (ms): 16505.3 | learning rate: 4.053E-05 | global batch size: 64 | lm loss: 6.375125E+00 | loss scale: 4096.0 | grad norm: 62796.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5488/ 159576 | consumed samples: 146480 | elapsed time per iteration (ms): 16398.9 | learning rate: 4.054E-05 | global batch size: 64 | lm loss: 6.370068E+00 | loss scale: 4096.0 | grad norm: 53653.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5489/ 159576 | consumed samples: 146544 | elapsed time per iteration (ms): 16369.7 | learning rate: 4.056E-05 | global batch size: 64 | lm loss: 6.376281E+00 | loss scale: 4096.0 | grad norm: 81099.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5490/ 159576 | consumed samples: 146608 | elapsed time per iteration (ms): 16827.2 | learning rate: 4.058E-05 | global batch size: 64 | lm loss: 6.479604E+00 | loss scale: 4096.0 | grad norm: 63855.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5491/ 159576 | consumed samples: 146672 | elapsed time per iteration (ms): 16415.6 | learning rate: 4.060E-05 | global batch size: 64 | lm loss: 6.352095E+00 | loss scale: 4096.0 | grad norm: 55122.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5492/ 159576 | consumed samples: 146736 | elapsed time per iteration (ms): 16444.9 | learning rate: 4.062E-05 | global batch size: 64 | lm loss: 6.506047E+00 | loss scale: 4096.0 | grad norm: 75137.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5493/ 159576 | consumed samples: 146800 | elapsed time per iteration (ms): 16342.5 | learning rate: 4.063E-05 | global batch size: 64 | lm loss: 6.379695E+00 | loss scale: 4096.0 | grad norm: 66901.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5494/ 159576 | consumed samples: 146864 | elapsed time per iteration (ms): 16502.1 | learning rate: 4.065E-05 | global batch size: 64 | lm loss: 6.368460E+00 | loss scale: 4096.0 | grad norm: 77897.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5495/ 159576 | consumed samples: 146928 | elapsed time per iteration (ms): 16338.1 | learning rate: 4.067E-05 | global batch size: 64 | lm loss: 6.329938E+00 | loss scale: 4096.0 | grad norm: 61931.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5496/ 159576 | consumed samples: 146992 | elapsed time per iteration (ms): 16346.0 | learning rate: 4.069E-05 | global batch size: 64 | lm loss: 6.425272E+00 | loss scale: 4096.0 | grad norm: 66524.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5497/ 159576 | consumed samples: 147056 | elapsed time per iteration (ms): 16765.2 | learning rate: 4.070E-05 | global batch size: 64 | lm loss: 6.296051E+00 | loss scale: 4096.0 | grad norm: 85285.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5498/ 159576 | consumed samples: 147120 | elapsed time per iteration (ms): 16329.2 | learning rate: 4.072E-05 | global batch size: 64 | lm loss: 6.365289E+00 | loss scale: 4096.0 | grad norm: 66015.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5499/ 159576 | consumed samples: 147184 | elapsed time per iteration (ms): 16383.4 | learning rate: 4.074E-05 | global batch size: 64 | lm loss: 6.294851E+00 | loss scale: 4096.0 | grad norm: 79758.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5500/ 159576 | consumed samples: 147248 | elapsed time per iteration (ms): 16337.1 | learning rate: 4.076E-05 | global batch size: 64 | lm loss: 6.289442E+00 | loss scale: 4096.0 | grad norm: 74687.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5501/ 159576 | consumed samples: 147312 | elapsed time per iteration (ms): 16790.4 | learning rate: 4.078E-05 | global batch size: 64 | lm loss: 6.322903E+00 | loss scale: 4096.0 | grad norm: 77364.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5502/ 159576 | consumed samples: 147376 | elapsed time per iteration (ms): 16423.5 | learning rate: 4.079E-05 | global batch size: 64 | lm loss: 6.460203E+00 | loss scale: 4096.0 | grad norm: 73803.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5503/ 159576 | consumed samples: 147440 | elapsed time per iteration (ms): 16368.8 | learning rate: 4.081E-05 | global batch size: 64 | lm loss: 6.396315E+00 | loss scale: 4096.0 | grad norm: 71129.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5504/ 159576 | consumed samples: 147504 | elapsed time per iteration (ms): 16346.2 | learning rate: 4.083E-05 | global batch size: 64 | lm loss: 6.425894E+00 | loss scale: 4096.0 | grad norm: 98647.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5505/ 159576 | consumed samples: 147568 | elapsed time per iteration (ms): 16678.7 | learning rate: 4.085E-05 | global batch size: 64 | lm loss: 6.381792E+00 | loss scale: 4096.0 | grad norm: 89626.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5506/ 159576 | consumed samples: 147632 | elapsed time per iteration (ms): 16332.5 | learning rate: 4.086E-05 | global batch size: 64 | lm loss: 6.483613E+00 | loss scale: 4096.0 | grad norm: 94069.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5507/ 159576 | consumed samples: 147696 | elapsed time per iteration (ms): 16400.4 | learning rate: 4.088E-05 | global batch size: 64 | lm loss: 6.236539E+00 | loss scale: 4096.0 | grad norm: 66871.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5508/ 159576 | consumed samples: 147760 | elapsed time per iteration (ms): 16657.8 | learning rate: 4.090E-05 | global batch size: 64 | lm loss: 6.445796E+00 | loss scale: 4096.0 | grad norm: 79385.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5509/ 159576 | consumed samples: 147824 | elapsed time per iteration (ms): 16347.0 | learning rate: 4.092E-05 | global batch size: 64 | lm loss: 6.421635E+00 | loss scale: 4096.0 | grad norm: 76910.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5510/ 159576 | consumed samples: 147888 | elapsed time per iteration (ms): 16379.6 | learning rate: 4.093E-05 | global batch size: 64 | lm loss: 6.403854E+00 | loss scale: 4096.0 | grad norm: 131977.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5511/ 159576 | consumed samples: 147952 | elapsed time per iteration (ms): 16364.3 | learning rate: 4.095E-05 | global batch size: 64 | lm loss: 6.393543E+00 | loss scale: 4096.0 | grad norm: 62655.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5512/ 159576 | consumed samples: 148016 | elapsed time per iteration (ms): 16734.0 | learning rate: 4.097E-05 | global batch size: 64 | lm loss: 6.378099E+00 | loss scale: 4096.0 | grad norm: 71057.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5513/ 159576 | consumed samples: 148080 | elapsed time per iteration (ms): 16360.1 | learning rate: 4.099E-05 | global batch size: 64 | lm loss: 6.439700E+00 | loss scale: 4096.0 | grad norm: 78346.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5514/ 159576 | consumed samples: 148144 | elapsed time per iteration (ms): 16356.7 | learning rate: 4.101E-05 | global batch size: 64 | lm loss: 6.380426E+00 | loss scale: 4096.0 | grad norm: 65583.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5515/ 159576 | consumed samples: 148208 | elapsed time per iteration (ms): 16416.2 | learning rate: 4.102E-05 | global batch size: 64 | lm loss: 6.492000E+00 | loss scale: 4096.0 | grad norm: 73724.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5516/ 159576 | consumed samples: 148272 | elapsed time per iteration (ms): 16451.6 | learning rate: 4.104E-05 | global batch size: 64 | lm loss: 6.433869E+00 | loss scale: 4096.0 | grad norm: 93695.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5517/ 159576 | consumed samples: 148336 | elapsed time per iteration (ms): 16367.1 | learning rate: 4.106E-05 | global batch size: 64 | lm loss: 6.316652E+00 | loss scale: 4096.0 | grad norm: 93995.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5518/ 159576 | consumed samples: 148400 | elapsed time per iteration (ms): 16352.2 | learning rate: 4.108E-05 | global batch size: 64 | lm loss: 6.331068E+00 | loss scale: 4096.0 | grad norm: 64601.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5519/ 159576 | consumed samples: 148464 | elapsed time per iteration (ms): 16660.3 | learning rate: 4.109E-05 | global batch size: 64 | lm loss: 6.441586E+00 | loss scale: 4096.0 | grad norm: 74837.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5520/ 159576 | consumed samples: 148528 | elapsed time per iteration (ms): 16346.7 | learning rate: 4.111E-05 | global batch size: 64 | lm loss: 6.422507E+00 | loss scale: 4096.0 | grad norm: 57013.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5521/ 159576 | consumed samples: 148592 | elapsed time per iteration (ms): 16378.9 | learning rate: 4.113E-05 | global batch size: 64 | lm loss: 6.388858E+00 | loss scale: 4096.0 | grad norm: 70843.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5522/ 159576 | consumed samples: 148656 | elapsed time per iteration (ms): 16311.3 | learning rate: 4.115E-05 | global batch size: 64 | lm loss: 6.335554E+00 | loss scale: 4096.0 | grad norm: 57811.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5523/ 159576 | consumed samples: 148720 | elapsed time per iteration (ms): 16599.0 | learning rate: 4.117E-05 | global batch size: 64 | lm loss: 6.427087E+00 | loss scale: 4096.0 | grad norm: 70169.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5524/ 159576 | consumed samples: 148784 | elapsed time per iteration (ms): 16322.1 | learning rate: 4.118E-05 | global batch size: 64 | lm loss: 6.400644E+00 | loss scale: 4096.0 | grad norm: 65162.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5525/ 159576 | consumed samples: 148848 | elapsed time per iteration (ms): 16352.5 | learning rate: 4.120E-05 | global batch size: 64 | lm loss: 6.476854E+00 | loss scale: 4096.0 | grad norm: 105828.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5526/ 159576 | consumed samples: 148912 | elapsed time per iteration (ms): 16357.9 | learning rate: 4.122E-05 | global batch size: 64 | lm loss: 6.444851E+00 | loss scale: 4096.0 | grad norm: 100931.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5527/ 159576 | consumed samples: 148976 | elapsed time per iteration (ms): 16656.2 | learning rate: 4.124E-05 | global batch size: 64 | lm loss: 6.448713E+00 | loss scale: 4096.0 | grad norm: 81570.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5528/ 159576 | consumed samples: 149040 | elapsed time per iteration (ms): 16320.4 | learning rate: 4.125E-05 | global batch size: 64 | lm loss: 6.406240E+00 | loss scale: 4096.0 | grad norm: 82766.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5529/ 159576 | consumed samples: 149104 | elapsed time per iteration (ms): 16353.3 | learning rate: 4.127E-05 | global batch size: 64 | lm loss: 6.376573E+00 | loss scale: 4096.0 | grad norm: 80155.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5530/ 159576 | consumed samples: 149168 | elapsed time per iteration (ms): 16695.5 | learning rate: 4.129E-05 | global batch size: 64 | lm loss: 6.316214E+00 | loss scale: 4096.0 | grad norm: 87358.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5531/ 159576 | consumed samples: 149232 | elapsed time per iteration (ms): 16408.8 | learning rate: 4.131E-05 | global batch size: 64 | lm loss: 6.481884E+00 | loss scale: 4096.0 | grad norm: 86550.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5532/ 159576 | consumed samples: 149296 | elapsed time per iteration (ms): 16343.8 | learning rate: 4.133E-05 | global batch size: 64 | lm loss: 6.483734E+00 | loss scale: 4096.0 | grad norm: 89939.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5533/ 159576 | consumed samples: 149360 | elapsed time per iteration (ms): 16370.7 | learning rate: 4.134E-05 | global batch size: 64 | lm loss: 6.318271E+00 | loss scale: 4096.0 | grad norm: 60516.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5534/ 159576 | consumed samples: 149424 | elapsed time per iteration (ms): 16594.8 | learning rate: 4.136E-05 | global batch size: 64 | lm loss: 6.391500E+00 | loss scale: 4096.0 | grad norm: 70379.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5535/ 159576 | consumed samples: 149488 | elapsed time per iteration (ms): 16425.6 | learning rate: 4.138E-05 | global batch size: 64 | lm loss: 6.418231E+00 | loss scale: 4096.0 | grad norm: 76225.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5536/ 159576 | consumed samples: 149552 | elapsed time per iteration (ms): 16364.4 | learning rate: 4.140E-05 | global batch size: 64 | lm loss: 6.461292E+00 | loss scale: 4096.0 | grad norm: 117347.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5537/ 159576 | consumed samples: 149616 | elapsed time per iteration (ms): 16683.3 | learning rate: 4.141E-05 | global batch size: 64 | lm loss: 6.394395E+00 | loss scale: 4096.0 | grad norm: 113236.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5538/ 159576 | consumed samples: 149680 | elapsed time per iteration (ms): 16407.6 | learning rate: 4.143E-05 | global batch size: 64 | lm loss: 6.348366E+00 | loss scale: 4096.0 | grad norm: 72699.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5539/ 159576 | consumed samples: 149744 | elapsed time per iteration (ms): 16372.4 | learning rate: 4.145E-05 | global batch size: 64 | lm loss: 6.395003E+00 | loss scale: 4096.0 | grad norm: 117054.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5540/ 159576 | consumed samples: 149808 | elapsed time per iteration (ms): 16344.7 | learning rate: 4.147E-05 | global batch size: 64 | lm loss: 6.345469E+00 | loss scale: 4096.0 | grad norm: 66826.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5541/ 159576 | consumed samples: 149872 | elapsed time per iteration (ms): 16658.7 | learning rate: 4.149E-05 | global batch size: 64 | lm loss: 6.311511E+00 | loss scale: 4096.0 | grad norm: 82398.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5542/ 159576 | consumed samples: 149936 | elapsed time per iteration (ms): 16382.8 | learning rate: 4.150E-05 | global batch size: 64 | lm loss: 6.407408E+00 | loss scale: 4096.0 | grad norm: 95381.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5543/ 159576 | consumed samples: 150000 | elapsed time per iteration (ms): 16397.3 | learning rate: 4.152E-05 | global batch size: 64 | lm loss: 6.385950E+00 | loss scale: 4096.0 | grad norm: 84966.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5544/ 159576 | consumed samples: 150064 | elapsed time per iteration (ms): 16328.2 | learning rate: 4.154E-05 | global batch size: 64 | lm loss: 6.386173E+00 | loss scale: 4096.0 | grad norm: 76280.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5545/ 159576 | consumed samples: 150128 | elapsed time per iteration (ms): 16536.9 | learning rate: 4.156E-05 | global batch size: 64 | lm loss: 6.429965E+00 | loss scale: 4096.0 | grad norm: 86199.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5546/ 159576 | consumed samples: 150192 | elapsed time per iteration (ms): 16341.0 | learning rate: 4.157E-05 | global batch size: 64 | lm loss: 6.440814E+00 | loss scale: 4096.0 | grad norm: 79643.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5547/ 159576 | consumed samples: 150256 | elapsed time per iteration (ms): 16434.5 | learning rate: 4.159E-05 | global batch size: 64 | lm loss: 6.292027E+00 | loss scale: 4096.0 | grad norm: 79649.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5548/ 159576 | consumed samples: 150320 | elapsed time per iteration (ms): 16744.8 | learning rate: 4.161E-05 | global batch size: 64 | lm loss: 6.363777E+00 | loss scale: 4096.0 | grad norm: 105818.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5549/ 159576 | consumed samples: 150384 | elapsed time per iteration (ms): 16446.0 | learning rate: 4.163E-05 | global batch size: 64 | lm loss: 6.525520E+00 | loss scale: 4096.0 | grad norm: 98900.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5550/ 159576 | consumed samples: 150448 | elapsed time per iteration (ms): 16313.7 | learning rate: 4.164E-05 | global batch size: 64 | lm loss: 6.426298E+00 | loss scale: 4096.0 | grad norm: 160080.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5551/ 159576 | consumed samples: 150512 | elapsed time per iteration (ms): 16414.2 | learning rate: 4.166E-05 | global batch size: 64 | lm loss: 6.409907E+00 | loss scale: 4096.0 | grad norm: 101291.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5552/ 159576 | consumed samples: 150576 | elapsed time per iteration (ms): 16772.9 | learning rate: 4.168E-05 | global batch size: 64 | lm loss: 6.312022E+00 | loss scale: 4096.0 | grad norm: 93961.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5553/ 159576 | consumed samples: 150640 | elapsed time per iteration (ms): 16393.9 | learning rate: 4.170E-05 | global batch size: 64 | lm loss: 6.460764E+00 | loss scale: 4096.0 | grad norm: 83044.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5554/ 159576 | consumed samples: 150704 | elapsed time per iteration (ms): 16414.7 | learning rate: 4.172E-05 | global batch size: 64 | lm loss: 6.395907E+00 | loss scale: 4096.0 | grad norm: 71935.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5555/ 159576 | consumed samples: 150768 | elapsed time per iteration (ms): 16459.3 | learning rate: 4.173E-05 | global batch size: 64 | lm loss: 6.381772E+00 | loss scale: 4096.0 | grad norm: 92358.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5556/ 159576 | consumed samples: 150832 | elapsed time per iteration (ms): 16620.5 | learning rate: 4.175E-05 | global batch size: 64 | lm loss: 6.334103E+00 | loss scale: 4096.0 | grad norm: 135953.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5557/ 159576 | consumed samples: 150896 | elapsed time per iteration (ms): 16420.0 | learning rate: 4.177E-05 | global batch size: 64 | lm loss: 6.350534E+00 | loss scale: 4096.0 | grad norm: 106866.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5558/ 159576 | consumed samples: 150960 | elapsed time per iteration (ms): 16394.5 | learning rate: 4.179E-05 | global batch size: 64 | lm loss: 6.449617E+00 | loss scale: 4096.0 | grad norm: 73758.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5559/ 159576 | consumed samples: 151024 | elapsed time per iteration (ms): 16702.3 | learning rate: 4.180E-05 | global batch size: 64 | lm loss: 6.422152E+00 | loss scale: 4096.0 | grad norm: 89216.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5560/ 159576 | consumed samples: 151088 | elapsed time per iteration (ms): 16526.0 | learning rate: 4.182E-05 | global batch size: 64 | lm loss: 6.502412E+00 | loss scale: 4096.0 | grad norm: 75899.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5561/ 159576 | consumed samples: 151152 | elapsed time per iteration (ms): 16388.8 | learning rate: 4.184E-05 | global batch size: 64 | lm loss: 6.353260E+00 | loss scale: 4096.0 | grad norm: 77216.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5562/ 159576 | consumed samples: 151216 | elapsed time per iteration (ms): 16375.8 | learning rate: 4.186E-05 | global batch size: 64 | lm loss: 6.380834E+00 | loss scale: 4096.0 | grad norm: 108978.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5563/ 159576 | consumed samples: 151280 | elapsed time per iteration (ms): 16840.5 | learning rate: 4.188E-05 | global batch size: 64 | lm loss: 6.389106E+00 | loss scale: 4096.0 | grad norm: 109665.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5564/ 159576 | consumed samples: 151344 | elapsed time per iteration (ms): 16437.6 | learning rate: 4.189E-05 | global batch size: 64 | lm loss: 6.440452E+00 | loss scale: 4096.0 | grad norm: 455190.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5565/ 159576 | consumed samples: 151408 | elapsed time per iteration (ms): 16403.9 | learning rate: 4.191E-05 | global batch size: 64 | lm loss: 6.425446E+00 | loss scale: 4096.0 | grad norm: 121150.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5566/ 159576 | consumed samples: 151472 | elapsed time per iteration (ms): 16435.1 | learning rate: 4.193E-05 | global batch size: 64 | lm loss: 6.344089E+00 | loss scale: 4096.0 | grad norm: 92189.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5567/ 159576 | consumed samples: 151536 | elapsed time per iteration (ms): 16459.4 | learning rate: 4.195E-05 | global batch size: 64 | lm loss: 6.402337E+00 | loss scale: 4096.0 | grad norm: 84995.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5568/ 159576 | consumed samples: 151600 | elapsed time per iteration (ms): 16389.2 | learning rate: 4.196E-05 | global batch size: 64 | lm loss: 6.522965E+00 | loss scale: 4096.0 | grad norm: 82583.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5569/ 159576 | consumed samples: 151664 | elapsed time per iteration (ms): 16371.9 | learning rate: 4.198E-05 | global batch size: 64 | lm loss: 6.357002E+00 | loss scale: 4096.0 | grad norm: 107776.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5570/ 159576 | consumed samples: 151728 | elapsed time per iteration (ms): 16715.6 | learning rate: 4.200E-05 | global batch size: 64 | lm loss: 6.462955E+00 | loss scale: 4096.0 | grad norm: 81656.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5571/ 159576 | consumed samples: 151792 | elapsed time per iteration (ms): 16448.5 | learning rate: 4.202E-05 | global batch size: 64 | lm loss: 6.378518E+00 | loss scale: 4096.0 | grad norm: 97168.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5572/ 159576 | consumed samples: 151856 | elapsed time per iteration (ms): 16375.2 | learning rate: 4.204E-05 | global batch size: 64 | lm loss: 6.426227E+00 | loss scale: 4096.0 | grad norm: 138499.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5573/ 159576 | consumed samples: 151920 | elapsed time per iteration (ms): 16391.0 | learning rate: 4.205E-05 | global batch size: 64 | lm loss: 6.467142E+00 | loss scale: 4096.0 | grad norm: 86986.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5574/ 159576 | consumed samples: 151984 | elapsed time per iteration (ms): 16660.3 | learning rate: 4.207E-05 | global batch size: 64 | lm loss: 6.343758E+00 | loss scale: 4096.0 | grad norm: 94104.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5575/ 159576 | consumed samples: 152048 | elapsed time per iteration (ms): 16384.3 | learning rate: 4.209E-05 | global batch size: 64 | lm loss: 6.264513E+00 | loss scale: 4096.0 | grad norm: 84463.915 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5576/ 159576 | consumed samples: 152112 | elapsed time per iteration (ms): 16429.0 | learning rate: 4.211E-05 | global batch size: 64 | lm loss: 6.395695E+00 | loss scale: 4096.0 | grad norm: 91060.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5577/ 159576 | consumed samples: 152176 | elapsed time per iteration (ms): 16399.6 | learning rate: 4.212E-05 | global batch size: 64 | lm loss: 6.322819E+00 | loss scale: 4096.0 | grad norm: 78884.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5578/ 159576 | consumed samples: 152240 | elapsed time per iteration (ms): 16529.4 | learning rate: 4.214E-05 | global batch size: 64 | lm loss: 6.361033E+00 | loss scale: 4096.0 | grad norm: 132712.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5579/ 159576 | consumed samples: 152304 | elapsed time per iteration (ms): 16454.4 | learning rate: 4.216E-05 | global batch size: 64 | lm loss: 6.276022E+00 | loss scale: 4096.0 | grad norm: 112417.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5580/ 159576 | consumed samples: 152368 | elapsed time per iteration (ms): 16401.1 | learning rate: 4.218E-05 | global batch size: 64 | lm loss: 6.375633E+00 | loss scale: 4096.0 | grad norm: 85824.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5581/ 159576 | consumed samples: 152432 | elapsed time per iteration (ms): 16688.1 | learning rate: 4.220E-05 | global batch size: 64 | lm loss: 6.447036E+00 | loss scale: 4096.0 | grad norm: 88314.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5582/ 159576 | consumed samples: 152496 | elapsed time per iteration (ms): 16427.8 | learning rate: 4.221E-05 | global batch size: 64 | lm loss: 6.438461E+00 | loss scale: 4096.0 | grad norm: 91826.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5583/ 159576 | consumed samples: 152560 | elapsed time per iteration (ms): 16326.4 | learning rate: 4.223E-05 | global batch size: 64 | lm loss: 6.404251E+00 | loss scale: 4096.0 | grad norm: 79746.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5584/ 159576 | consumed samples: 152624 | elapsed time per iteration (ms): 16429.7 | learning rate: 4.225E-05 | global batch size: 64 | lm loss: 6.470784E+00 | loss scale: 4096.0 | grad norm: 78255.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5585/ 159576 | consumed samples: 152688 | elapsed time per iteration (ms): 16577.7 | learning rate: 4.227E-05 | global batch size: 64 | lm loss: 6.352365E+00 | loss scale: 4096.0 | grad norm: 85894.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5586/ 159576 | consumed samples: 152752 | elapsed time per iteration (ms): 16409.6 | learning rate: 4.228E-05 | global batch size: 64 | lm loss: 6.367690E+00 | loss scale: 4096.0 | grad norm: 268686.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5587/ 159576 | consumed samples: 152816 | elapsed time per iteration (ms): 16393.7 | learning rate: 4.230E-05 | global batch size: 64 | lm loss: 6.334382E+00 | loss scale: 4096.0 | grad norm: 92996.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5588/ 159576 | consumed samples: 152880 | elapsed time per iteration (ms): 16647.8 | learning rate: 4.232E-05 | global batch size: 64 | lm loss: 6.174354E+00 | loss scale: 4096.0 | grad norm: 99570.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5589/ 159576 | consumed samples: 152944 | elapsed time per iteration (ms): 16470.5 | learning rate: 4.234E-05 | global batch size: 64 | lm loss: 6.349049E+00 | loss scale: 4096.0 | grad norm: 74523.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5590/ 159576 | consumed samples: 153008 | elapsed time per iteration (ms): 16348.7 | learning rate: 4.236E-05 | global batch size: 64 | lm loss: 6.388356E+00 | loss scale: 4096.0 | grad norm: 57623.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5591/ 159576 | consumed samples: 153072 | elapsed time per iteration (ms): 16338.9 | learning rate: 4.237E-05 | global batch size: 64 | lm loss: 6.399694E+00 | loss scale: 4096.0 | grad norm: 75852.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5592/ 159576 | consumed samples: 153136 | elapsed time per iteration (ms): 16704.7 | learning rate: 4.239E-05 | global batch size: 64 | lm loss: 6.327959E+00 | loss scale: 4096.0 | grad norm: 69452.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5593/ 159576 | consumed samples: 153200 | elapsed time per iteration (ms): 16334.3 | learning rate: 4.241E-05 | global batch size: 64 | lm loss: 6.435533E+00 | loss scale: 4096.0 | grad norm: 111529.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5594/ 159576 | consumed samples: 153264 | elapsed time per iteration (ms): 16385.3 | learning rate: 4.243E-05 | global batch size: 64 | lm loss: 6.438297E+00 | loss scale: 4096.0 | grad norm: 154695.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5595/ 159576 | consumed samples: 153328 | elapsed time per iteration (ms): 16343.1 | learning rate: 4.244E-05 | global batch size: 64 | lm loss: 6.431480E+00 | loss scale: 4096.0 | grad norm: 133987.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5596/ 159576 | consumed samples: 153392 | elapsed time per iteration (ms): 16571.5 | learning rate: 4.246E-05 | global batch size: 64 | lm loss: 6.326744E+00 | loss scale: 4096.0 | grad norm: 65072.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5597/ 159576 | consumed samples: 153456 | elapsed time per iteration (ms): 16304.0 | learning rate: 4.248E-05 | global batch size: 64 | lm loss: 6.450805E+00 | loss scale: 4096.0 | grad norm: 67613.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5598/ 159576 | consumed samples: 153520 | elapsed time per iteration (ms): 16343.8 | learning rate: 4.250E-05 | global batch size: 64 | lm loss: 6.327376E+00 | loss scale: 4096.0 | grad norm: 77614.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5599/ 159576 | consumed samples: 153584 | elapsed time per iteration (ms): 16672.4 | learning rate: 4.251E-05 | global batch size: 64 | lm loss: 6.502485E+00 | loss scale: 4096.0 | grad norm: 97568.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5600/ 159576 | consumed samples: 153648 | elapsed time per iteration (ms): 16410.3 | learning rate: 4.253E-05 | global batch size: 64 | lm loss: 6.429380E+00 | loss scale: 4096.0 | grad norm: 84231.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5601/ 159576 | consumed samples: 153712 | elapsed time per iteration (ms): 16391.0 | learning rate: 4.255E-05 | global batch size: 64 | lm loss: 6.436201E+00 | loss scale: 4096.0 | grad norm: 63319.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5602/ 159576 | consumed samples: 153776 | elapsed time per iteration (ms): 16453.8 | learning rate: 4.257E-05 | global batch size: 64 | lm loss: 6.263167E+00 | loss scale: 4096.0 | grad norm: 71392.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5603/ 159576 | consumed samples: 153840 | elapsed time per iteration (ms): 16775.3 | learning rate: 4.259E-05 | global batch size: 64 | lm loss: 6.413259E+00 | loss scale: 4096.0 | grad norm: 123761.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5604/ 159576 | consumed samples: 153904 | elapsed time per iteration (ms): 16504.7 | learning rate: 4.260E-05 | global batch size: 64 | lm loss: 6.544505E+00 | loss scale: 4096.0 | grad norm: 83624.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5605/ 159576 | consumed samples: 153968 | elapsed time per iteration (ms): 16306.6 | learning rate: 4.262E-05 | global batch size: 64 | lm loss: 6.452788E+00 | loss scale: 8192.0 | grad norm: 65011.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5606/ 159576 | consumed samples: 154032 | elapsed time per iteration (ms): 16378.4 | learning rate: 4.264E-05 | global batch size: 64 | lm loss: 6.422714E+00 | loss scale: 8192.0 | grad norm: 246798.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5607/ 159576 | consumed samples: 154096 | elapsed time per iteration (ms): 16552.8 | learning rate: 4.266E-05 | global batch size: 64 | lm loss: 6.375990E+00 | loss scale: 8192.0 | grad norm: 169739.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5608/ 159576 | consumed samples: 154160 | elapsed time per iteration (ms): 16382.8 | learning rate: 4.267E-05 | global batch size: 64 | lm loss: 6.358736E+00 | loss scale: 8192.0 | grad norm: 157950.735 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5609/ 159576 | consumed samples: 154224 | elapsed time per iteration (ms): 16422.0 | learning rate: 4.269E-05 | global batch size: 64 | lm loss: 6.444921E+00 | loss scale: 8192.0 | grad norm: 125911.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5610/ 159576 | consumed samples: 154288 | elapsed time per iteration (ms): 9561.0 | learning rate: 4.269E-05 | global batch size: 64 | lm loss: 6.367582E+00 | loss scale: 8192.0 | grad norm: 125911.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5611/ 159576 | consumed samples: 154352 | elapsed time per iteration (ms): 16020.4 | learning rate: 4.271E-05 | global batch size: 64 | lm loss: 6.341266E+00 | loss scale: 8192.0 | grad norm: 196277.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5612/ 159576 | consumed samples: 154416 | elapsed time per iteration (ms): 16411.4 | learning rate: 4.273E-05 | global batch size: 64 | lm loss: 6.386235E+00 | loss scale: 8192.0 | grad norm: 174236.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5613/ 159576 | consumed samples: 154480 | elapsed time per iteration (ms): 16406.8 | learning rate: 4.275E-05 | global batch size: 64 | lm loss: 6.302393E+00 | loss scale: 8192.0 | grad norm: 159949.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5614/ 159576 | consumed samples: 154544 | elapsed time per iteration (ms): 16823.0 | learning rate: 4.276E-05 | global batch size: 64 | lm loss: 6.427998E+00 | loss scale: 8192.0 | grad norm: 139822.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5615/ 159576 | consumed samples: 154608 | elapsed time per iteration (ms): 16523.9 | learning rate: 4.278E-05 | global batch size: 64 | lm loss: 6.437964E+00 | loss scale: 8192.0 | grad norm: 148561.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5616/ 159576 | consumed samples: 154672 | elapsed time per iteration (ms): 16444.1 | learning rate: 4.280E-05 | global batch size: 64 | lm loss: 6.387279E+00 | loss scale: 8192.0 | grad norm: 165172.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5617/ 159576 | consumed samples: 154736 | elapsed time per iteration (ms): 16455.6 | learning rate: 4.282E-05 | global batch size: 64 | lm loss: 6.365323E+00 | loss scale: 8192.0 | grad norm: 139740.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5618/ 159576 | consumed samples: 154800 | elapsed time per iteration (ms): 16876.6 | learning rate: 4.283E-05 | global batch size: 64 | lm loss: 6.405371E+00 | loss scale: 8192.0 | grad norm: 191865.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5619/ 159576 | consumed samples: 154864 | elapsed time per iteration (ms): 16465.6 | learning rate: 4.285E-05 | global batch size: 64 | lm loss: 6.400004E+00 | loss scale: 8192.0 | grad norm: 131301.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5620/ 159576 | consumed samples: 154928 | elapsed time per iteration (ms): 16407.9 | learning rate: 4.287E-05 | global batch size: 64 | lm loss: 6.424757E+00 | loss scale: 8192.0 | grad norm: 152162.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5621/ 159576 | consumed samples: 154992 | elapsed time per iteration (ms): 16429.7 | learning rate: 4.289E-05 | global batch size: 64 | lm loss: 6.415905E+00 | loss scale: 8192.0 | grad norm: 184054.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5622/ 159576 | consumed samples: 155056 | elapsed time per iteration (ms): 16685.6 | learning rate: 4.291E-05 | global batch size: 64 | lm loss: 6.440601E+00 | loss scale: 8192.0 | grad norm: 290641.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5623/ 159576 | consumed samples: 155120 | elapsed time per iteration (ms): 16500.9 | learning rate: 4.292E-05 | global batch size: 64 | lm loss: 6.392663E+00 | loss scale: 8192.0 | grad norm: 151394.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5624/ 159576 | consumed samples: 155184 | elapsed time per iteration (ms): 16485.6 | learning rate: 4.294E-05 | global batch size: 64 | lm loss: 6.440325E+00 | loss scale: 8192.0 | grad norm: 132735.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5625/ 159576 | consumed samples: 155248 | elapsed time per iteration (ms): 16832.2 | learning rate: 4.296E-05 | global batch size: 64 | lm loss: 6.382560E+00 | loss scale: 8192.0 | grad norm: 167706.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5626/ 159576 | consumed samples: 155312 | elapsed time per iteration (ms): 16294.5 | learning rate: 4.298E-05 | global batch size: 64 | lm loss: 6.422318E+00 | loss scale: 8192.0 | grad norm: 144671.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5627/ 159576 | consumed samples: 155376 | elapsed time per iteration (ms): 16433.6 | learning rate: 4.299E-05 | global batch size: 64 | lm loss: 6.400022E+00 | loss scale: 8192.0 | grad norm: 174837.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5628/ 159576 | consumed samples: 155440 | elapsed time per iteration (ms): 16385.0 | learning rate: 4.301E-05 | global batch size: 64 | lm loss: 6.465958E+00 | loss scale: 8192.0 | grad norm: 167317.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5629/ 159576 | consumed samples: 155504 | elapsed time per iteration (ms): 16829.3 | learning rate: 4.303E-05 | global batch size: 64 | lm loss: 6.365539E+00 | loss scale: 8192.0 | grad norm: 150073.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5630/ 159576 | consumed samples: 155568 | elapsed time per iteration (ms): 16533.0 | learning rate: 4.305E-05 | global batch size: 64 | lm loss: 6.385098E+00 | loss scale: 8192.0 | grad norm: 132923.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5631/ 159576 | consumed samples: 155632 | elapsed time per iteration (ms): 16451.7 | learning rate: 4.307E-05 | global batch size: 64 | lm loss: 6.314290E+00 | loss scale: 8192.0 | grad norm: 178222.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5632/ 159576 | consumed samples: 155696 | elapsed time per iteration (ms): 16400.8 | learning rate: 4.308E-05 | global batch size: 64 | lm loss: 6.467572E+00 | loss scale: 8192.0 | grad norm: 147545.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5633/ 159576 | consumed samples: 155760 | elapsed time per iteration (ms): 16566.1 | learning rate: 4.310E-05 | global batch size: 64 | lm loss: 6.341013E+00 | loss scale: 8192.0 | grad norm: 200712.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5634/ 159576 | consumed samples: 155824 | elapsed time per iteration (ms): 16393.9 | learning rate: 4.312E-05 | global batch size: 64 | lm loss: 6.319093E+00 | loss scale: 8192.0 | grad norm: 161666.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5635/ 159576 | consumed samples: 155888 | elapsed time per iteration (ms): 16416.9 | learning rate: 4.314E-05 | global batch size: 64 | lm loss: 6.461274E+00 | loss scale: 8192.0 | grad norm: 572124.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5636/ 159576 | consumed samples: 155952 | elapsed time per iteration (ms): 16756.4 | learning rate: 4.315E-05 | global batch size: 64 | lm loss: 6.453969E+00 | loss scale: 8192.0 | grad norm: 205582.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5637/ 159576 | consumed samples: 156016 | elapsed time per iteration (ms): 16349.2 | learning rate: 4.317E-05 | global batch size: 64 | lm loss: 6.386354E+00 | loss scale: 8192.0 | grad norm: 188662.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5638/ 159576 | consumed samples: 156080 | elapsed time per iteration (ms): 16437.2 | learning rate: 4.319E-05 | global batch size: 64 | lm loss: 6.458478E+00 | loss scale: 8192.0 | grad norm: 208129.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5639/ 159576 | consumed samples: 156144 | elapsed time per iteration (ms): 16478.4 | learning rate: 4.321E-05 | global batch size: 64 | lm loss: 6.361111E+00 | loss scale: 8192.0 | grad norm: 383224.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5640/ 159576 | consumed samples: 156208 | elapsed time per iteration (ms): 16543.3 | learning rate: 4.322E-05 | global batch size: 64 | lm loss: 6.470639E+00 | loss scale: 8192.0 | grad norm: 244281.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5641/ 159576 | consumed samples: 156272 | elapsed time per iteration (ms): 16418.6 | learning rate: 4.324E-05 | global batch size: 64 | lm loss: 6.453573E+00 | loss scale: 8192.0 | grad norm: 246555.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5642/ 159576 | consumed samples: 156336 | elapsed time per iteration (ms): 16347.0 | learning rate: 4.326E-05 | global batch size: 64 | lm loss: 6.416644E+00 | loss scale: 8192.0 | grad norm: 177394.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5643/ 159576 | consumed samples: 156400 | elapsed time per iteration (ms): 9564.0 | learning rate: 4.326E-05 | global batch size: 64 | lm loss: 6.433064E+00 | loss scale: 4096.0 | grad norm: 177394.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5644/ 159576 | consumed samples: 156464 | elapsed time per iteration (ms): 16246.5 | learning rate: 4.328E-05 | global batch size: 64 | lm loss: 6.334921E+00 | loss scale: 4096.0 | grad norm: 91031.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5645/ 159576 | consumed samples: 156528 | elapsed time per iteration (ms): 16410.8 | learning rate: 4.330E-05 | global batch size: 64 | lm loss: 6.398224E+00 | loss scale: 4096.0 | grad norm: 82899.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5646/ 159576 | consumed samples: 156592 | elapsed time per iteration (ms): 16332.5 | learning rate: 4.331E-05 | global batch size: 64 | lm loss: 6.469447E+00 | loss scale: 4096.0 | grad norm: 93235.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5647/ 159576 | consumed samples: 156656 | elapsed time per iteration (ms): 16380.9 | learning rate: 4.333E-05 | global batch size: 64 | lm loss: 6.414939E+00 | loss scale: 4096.0 | grad norm: 98498.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5648/ 159576 | consumed samples: 156720 | elapsed time per iteration (ms): 16453.9 | learning rate: 4.335E-05 | global batch size: 64 | lm loss: 6.435335E+00 | loss scale: 4096.0 | grad norm: 110431.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5649/ 159576 | consumed samples: 156784 | elapsed time per iteration (ms): 16375.1 | learning rate: 4.337E-05 | global batch size: 64 | lm loss: 6.367991E+00 | loss scale: 4096.0 | grad norm: 112025.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5650/ 159576 | consumed samples: 156848 | elapsed time per iteration (ms): 16396.5 | learning rate: 4.338E-05 | global batch size: 64 | lm loss: 6.453450E+00 | loss scale: 4096.0 | grad norm: 142538.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5651/ 159576 | consumed samples: 156912 | elapsed time per iteration (ms): 16662.1 | learning rate: 4.340E-05 | global batch size: 64 | lm loss: 6.376512E+00 | loss scale: 4096.0 | grad norm: 104884.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5652/ 159576 | consumed samples: 156976 | elapsed time per iteration (ms): 16397.7 | learning rate: 4.342E-05 | global batch size: 64 | lm loss: 6.398083E+00 | loss scale: 4096.0 | grad norm: 97434.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5653/ 159576 | consumed samples: 157040 | elapsed time per iteration (ms): 16367.3 | learning rate: 4.344E-05 | global batch size: 64 | lm loss: 6.468301E+00 | loss scale: 4096.0 | grad norm: 189503.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5654/ 159576 | consumed samples: 157104 | elapsed time per iteration (ms): 16332.7 | learning rate: 4.346E-05 | global batch size: 64 | lm loss: 6.449702E+00 | loss scale: 4096.0 | grad norm: 101635.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5655/ 159576 | consumed samples: 157168 | elapsed time per iteration (ms): 16814.3 | learning rate: 4.347E-05 | global batch size: 64 | lm loss: 6.417078E+00 | loss scale: 4096.0 | grad norm: 163445.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5656/ 159576 | consumed samples: 157232 | elapsed time per iteration (ms): 16304.4 | learning rate: 4.349E-05 | global batch size: 64 | lm loss: 6.445296E+00 | loss scale: 4096.0 | grad norm: 90409.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5657/ 159576 | consumed samples: 157296 | elapsed time per iteration (ms): 16400.9 | learning rate: 4.351E-05 | global batch size: 64 | lm loss: 6.445564E+00 | loss scale: 4096.0 | grad norm: 81513.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5658/ 159576 | consumed samples: 157360 | elapsed time per iteration (ms): 16340.5 | learning rate: 4.353E-05 | global batch size: 64 | lm loss: 6.333720E+00 | loss scale: 4096.0 | grad norm: 134428.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5659/ 159576 | consumed samples: 157424 | elapsed time per iteration (ms): 16553.5 | learning rate: 4.354E-05 | global batch size: 64 | lm loss: 6.401426E+00 | loss scale: 4096.0 | grad norm: 106022.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5660/ 159576 | consumed samples: 157488 | elapsed time per iteration (ms): 16387.3 | learning rate: 4.356E-05 | global batch size: 64 | lm loss: 6.438997E+00 | loss scale: 4096.0 | grad norm: 83955.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5661/ 159576 | consumed samples: 157552 | elapsed time per iteration (ms): 16456.3 | learning rate: 4.358E-05 | global batch size: 64 | lm loss: 6.402083E+00 | loss scale: 4096.0 | grad norm: 85068.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5662/ 159576 | consumed samples: 157616 | elapsed time per iteration (ms): 16696.8 | learning rate: 4.360E-05 | global batch size: 64 | lm loss: 6.441435E+00 | loss scale: 4096.0 | grad norm: 101578.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5663/ 159576 | consumed samples: 157680 | elapsed time per iteration (ms): 16497.3 | learning rate: 4.362E-05 | global batch size: 64 | lm loss: 6.405056E+00 | loss scale: 4096.0 | grad norm: 90814.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5664/ 159576 | consumed samples: 157744 | elapsed time per iteration (ms): 16393.8 | learning rate: 4.363E-05 | global batch size: 64 | lm loss: 6.437488E+00 | loss scale: 4096.0 | grad norm: 99258.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5665/ 159576 | consumed samples: 157808 | elapsed time per iteration (ms): 16464.8 | learning rate: 4.365E-05 | global batch size: 64 | lm loss: 6.461691E+00 | loss scale: 4096.0 | grad norm: 150615.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5666/ 159576 | consumed samples: 157872 | elapsed time per iteration (ms): 16442.6 | learning rate: 4.367E-05 | global batch size: 64 | lm loss: 6.379485E+00 | loss scale: 4096.0 | grad norm: 87553.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5667/ 159576 | consumed samples: 157936 | elapsed time per iteration (ms): 16408.0 | learning rate: 4.369E-05 | global batch size: 64 | lm loss: 6.436778E+00 | loss scale: 4096.0 | grad norm: 86837.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5668/ 159576 | consumed samples: 158000 | elapsed time per iteration (ms): 16382.6 | learning rate: 4.370E-05 | global batch size: 64 | lm loss: 6.456222E+00 | loss scale: 4096.0 | grad norm: 81561.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5669/ 159576 | consumed samples: 158064 | elapsed time per iteration (ms): 16606.9 | learning rate: 4.372E-05 | global batch size: 64 | lm loss: 6.394565E+00 | loss scale: 4096.0 | grad norm: 90655.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5670/ 159576 | consumed samples: 158128 | elapsed time per iteration (ms): 16482.0 | learning rate: 4.374E-05 | global batch size: 64 | lm loss: 6.388999E+00 | loss scale: 4096.0 | grad norm: 139861.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5671/ 159576 | consumed samples: 158192 | elapsed time per iteration (ms): 16430.2 | learning rate: 4.376E-05 | global batch size: 64 | lm loss: 6.348672E+00 | loss scale: 4096.0 | grad norm: 79933.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5672/ 159576 | consumed samples: 158256 | elapsed time per iteration (ms): 16343.5 | learning rate: 4.378E-05 | global batch size: 64 | lm loss: 6.358377E+00 | loss scale: 4096.0 | grad norm: 91907.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5673/ 159576 | consumed samples: 158320 | elapsed time per iteration (ms): 16738.6 | learning rate: 4.379E-05 | global batch size: 64 | lm loss: 6.397278E+00 | loss scale: 4096.0 | grad norm: 81347.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5674/ 159576 | consumed samples: 158384 | elapsed time per iteration (ms): 16377.1 | learning rate: 4.381E-05 | global batch size: 64 | lm loss: 6.330511E+00 | loss scale: 4096.0 | grad norm: 87623.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5675/ 159576 | consumed samples: 158448 | elapsed time per iteration (ms): 16376.8 | learning rate: 4.383E-05 | global batch size: 64 | lm loss: 6.400737E+00 | loss scale: 4096.0 | grad norm: 86243.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5676/ 159576 | consumed samples: 158512 | elapsed time per iteration (ms): 16407.2 | learning rate: 4.385E-05 | global batch size: 64 | lm loss: 6.373343E+00 | loss scale: 4096.0 | grad norm: 112233.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5677/ 159576 | consumed samples: 158576 | elapsed time per iteration (ms): 16504.3 | learning rate: 4.386E-05 | global batch size: 64 | lm loss: 6.340403E+00 | loss scale: 4096.0 | grad norm: 87545.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5678/ 159576 | consumed samples: 158640 | elapsed time per iteration (ms): 16469.6 | learning rate: 4.388E-05 | global batch size: 64 | lm loss: 6.483582E+00 | loss scale: 4096.0 | grad norm: 85898.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5679/ 159576 | consumed samples: 158704 | elapsed time per iteration (ms): 16363.2 | learning rate: 4.390E-05 | global batch size: 64 | lm loss: 6.384809E+00 | loss scale: 4096.0 | grad norm: 75822.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5680/ 159576 | consumed samples: 158768 | elapsed time per iteration (ms): 16705.5 | learning rate: 4.392E-05 | global batch size: 64 | lm loss: 6.360985E+00 | loss scale: 4096.0 | grad norm: 93411.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5681/ 159576 | consumed samples: 158832 | elapsed time per iteration (ms): 16533.6 | learning rate: 4.393E-05 | global batch size: 64 | lm loss: 6.346332E+00 | loss scale: 4096.0 | grad norm: 98347.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5682/ 159576 | consumed samples: 158896 | elapsed time per iteration (ms): 16424.8 | learning rate: 4.395E-05 | global batch size: 64 | lm loss: 6.452760E+00 | loss scale: 4096.0 | grad norm: 113842.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5683/ 159576 | consumed samples: 158960 | elapsed time per iteration (ms): 16412.1 | learning rate: 4.397E-05 | global batch size: 64 | lm loss: 6.394449E+00 | loss scale: 4096.0 | grad norm: 225192.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5684/ 159576 | consumed samples: 159024 | elapsed time per iteration (ms): 16934.4 | learning rate: 4.399E-05 | global batch size: 64 | lm loss: 6.394941E+00 | loss scale: 4096.0 | grad norm: 81396.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5685/ 159576 | consumed samples: 159088 | elapsed time per iteration (ms): 16454.0 | learning rate: 4.401E-05 | global batch size: 64 | lm loss: 6.261321E+00 | loss scale: 4096.0 | grad norm: 86149.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5686/ 159576 | consumed samples: 159152 | elapsed time per iteration (ms): 16431.5 | learning rate: 4.402E-05 | global batch size: 64 | lm loss: 6.492159E+00 | loss scale: 4096.0 | grad norm: 119300.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5687/ 159576 | consumed samples: 159216 | elapsed time per iteration (ms): 16386.6 | learning rate: 4.404E-05 | global batch size: 64 | lm loss: 6.511878E+00 | loss scale: 4096.0 | grad norm: 91338.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5688/ 159576 | consumed samples: 159280 | elapsed time per iteration (ms): 16584.3 | learning rate: 4.406E-05 | global batch size: 64 | lm loss: 6.442091E+00 | loss scale: 4096.0 | grad norm: 127329.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5689/ 159576 | consumed samples: 159344 | elapsed time per iteration (ms): 16414.9 | learning rate: 4.408E-05 | global batch size: 64 | lm loss: 6.445393E+00 | loss scale: 4096.0 | grad norm: 74818.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5690/ 159576 | consumed samples: 159408 | elapsed time per iteration (ms): 16438.8 | learning rate: 4.409E-05 | global batch size: 64 | lm loss: 6.349149E+00 | loss scale: 4096.0 | grad norm: 90721.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5691/ 159576 | consumed samples: 159472 | elapsed time per iteration (ms): 16762.3 | learning rate: 4.411E-05 | global batch size: 64 | lm loss: 6.450273E+00 | loss scale: 4096.0 | grad norm: 84948.864 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5692/ 159576 | consumed samples: 159536 | elapsed time per iteration (ms): 16461.8 | learning rate: 4.413E-05 | global batch size: 64 | lm loss: 6.451497E+00 | loss scale: 4096.0 | grad norm: 160376.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5693/ 159576 | consumed samples: 159600 | elapsed time per iteration (ms): 16376.8 | learning rate: 4.415E-05 | global batch size: 64 | lm loss: 6.414182E+00 | loss scale: 4096.0 | grad norm: 64931.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5694/ 159576 | consumed samples: 159664 | elapsed time per iteration (ms): 16448.9 | learning rate: 4.417E-05 | global batch size: 64 | lm loss: 6.392116E+00 | loss scale: 4096.0 | grad norm: 82604.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5695/ 159576 | consumed samples: 159728 | elapsed time per iteration (ms): 16621.3 | learning rate: 4.418E-05 | global batch size: 64 | lm loss: 6.379553E+00 | loss scale: 4096.0 | grad norm: 96286.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5696/ 159576 | consumed samples: 159792 | elapsed time per iteration (ms): 16447.4 | learning rate: 4.420E-05 | global batch size: 64 | lm loss: 6.319911E+00 | loss scale: 4096.0 | grad norm: 113489.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5697/ 159576 | consumed samples: 159856 | elapsed time per iteration (ms): 16402.2 | learning rate: 4.422E-05 | global batch size: 64 | lm loss: 6.538674E+00 | loss scale: 4096.0 | grad norm: 88380.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5698/ 159576 | consumed samples: 159920 | elapsed time per iteration (ms): 16462.3 | learning rate: 4.424E-05 | global batch size: 64 | lm loss: 6.450464E+00 | loss scale: 4096.0 | grad norm: 86213.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5699/ 159576 | consumed samples: 159984 | elapsed time per iteration (ms): 16600.9 | learning rate: 4.425E-05 | global batch size: 64 | lm loss: 6.360521E+00 | loss scale: 4096.0 | grad norm: 102839.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5700/ 159576 | consumed samples: 160048 | elapsed time per iteration (ms): 16472.4 | learning rate: 4.427E-05 | global batch size: 64 | lm loss: 6.408503E+00 | loss scale: 4096.0 | grad norm: 93708.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 01:07:57] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-25 01:07:57] PULSE: tr8-104B is running for 19:15:46 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 5701/ 159576 | consumed samples: 160112 | elapsed time per iteration (ms): 16355.6 | learning rate: 4.429E-05 | global batch size: 64 | lm loss: 6.383047E+00 | loss scale: 4096.0 | grad norm: 277390.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5702/ 159576 | consumed samples: 160176 | elapsed time per iteration (ms): 16761.7 | learning rate: 4.431E-05 | global batch size: 64 | lm loss: 6.450840E+00 | loss scale: 4096.0 | grad norm: 91541.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5703/ 159576 | consumed samples: 160240 | elapsed time per iteration (ms): 9560.9 | learning rate: 4.431E-05 | global batch size: 64 | lm loss: 6.493016E+00 | loss scale: 2048.0 | grad norm: 91541.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5704/ 159576 | consumed samples: 160304 | elapsed time per iteration (ms): 15881.2 | learning rate: 4.433E-05 | global batch size: 64 | lm loss: 6.513262E+00 | loss scale: 2048.0 | grad norm: 63292.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5705/ 159576 | consumed samples: 160368 | elapsed time per iteration (ms): 16396.1 | learning rate: 4.434E-05 | global batch size: 64 | lm loss: 6.341697E+00 | loss scale: 2048.0 | grad norm: 49175.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5706/ 159576 | consumed samples: 160432 | elapsed time per iteration (ms): 16742.1 | learning rate: 4.436E-05 | global batch size: 64 | lm loss: 6.376310E+00 | loss scale: 2048.0 | grad norm: 49500.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5707/ 159576 | consumed samples: 160496 | elapsed time per iteration (ms): 16502.9 | learning rate: 4.438E-05 | global batch size: 64 | lm loss: 6.305195E+00 | loss scale: 2048.0 | grad norm: 66863.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5708/ 159576 | consumed samples: 160560 | elapsed time per iteration (ms): 16427.2 | learning rate: 4.440E-05 | global batch size: 64 | lm loss: 6.338213E+00 | loss scale: 2048.0 | grad norm: 49886.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5709/ 159576 | consumed samples: 160624 | elapsed time per iteration (ms): 16430.3 | learning rate: 4.441E-05 | global batch size: 64 | lm loss: 6.403567E+00 | loss scale: 2048.0 | grad norm: 67050.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5710/ 159576 | consumed samples: 160688 | elapsed time per iteration (ms): 16701.6 | learning rate: 4.443E-05 | global batch size: 64 | lm loss: 6.365169E+00 | loss scale: 2048.0 | grad norm: 65553.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5711/ 159576 | consumed samples: 160752 | elapsed time per iteration (ms): 16495.7 | learning rate: 4.445E-05 | global batch size: 64 | lm loss: 6.437389E+00 | loss scale: 2048.0 | grad norm: 42948.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5712/ 159576 | consumed samples: 160816 | elapsed time per iteration (ms): 16396.0 | learning rate: 4.447E-05 | global batch size: 64 | lm loss: 6.359374E+00 | loss scale: 2048.0 | grad norm: 47459.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5713/ 159576 | consumed samples: 160880 | elapsed time per iteration (ms): 16399.1 | learning rate: 4.449E-05 | global batch size: 64 | lm loss: 6.384996E+00 | loss scale: 2048.0 | grad norm: 54873.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5714/ 159576 | consumed samples: 160944 | elapsed time per iteration (ms): 16655.8 | learning rate: 4.450E-05 | global batch size: 64 | lm loss: 6.407744E+00 | loss scale: 2048.0 | grad norm: 49484.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5715/ 159576 | consumed samples: 161008 | elapsed time per iteration (ms): 16395.3 | learning rate: 4.452E-05 | global batch size: 64 | lm loss: 6.596529E+00 | loss scale: 2048.0 | grad norm: 56205.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5716/ 159576 | consumed samples: 161072 | elapsed time per iteration (ms): 16464.0 | learning rate: 4.454E-05 | global batch size: 64 | lm loss: 6.421166E+00 | loss scale: 2048.0 | grad norm: 62635.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5717/ 159576 | consumed samples: 161136 | elapsed time per iteration (ms): 16725.6 | learning rate: 4.456E-05 | global batch size: 64 | lm loss: 6.470579E+00 | loss scale: 2048.0 | grad norm: 63421.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5718/ 159576 | consumed samples: 161200 | elapsed time per iteration (ms): 16562.5 | learning rate: 4.457E-05 | global batch size: 64 | lm loss: 6.431957E+00 | loss scale: 2048.0 | grad norm: 41629.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5719/ 159576 | consumed samples: 161264 | elapsed time per iteration (ms): 16447.6 | learning rate: 4.459E-05 | global batch size: 64 | lm loss: 6.372540E+00 | loss scale: 2048.0 | grad norm: 52749.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5720/ 159576 | consumed samples: 161328 | elapsed time per iteration (ms): 16436.0 | learning rate: 4.461E-05 | global batch size: 64 | lm loss: 6.376571E+00 | loss scale: 2048.0 | grad norm: 152378.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5721/ 159576 | consumed samples: 161392 | elapsed time per iteration (ms): 16522.7 | learning rate: 4.463E-05 | global batch size: 64 | lm loss: 6.346034E+00 | loss scale: 2048.0 | grad norm: 79170.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5722/ 159576 | consumed samples: 161456 | elapsed time per iteration (ms): 16447.7 | learning rate: 4.464E-05 | global batch size: 64 | lm loss: 6.379195E+00 | loss scale: 2048.0 | grad norm: 54035.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5723/ 159576 | consumed samples: 161520 | elapsed time per iteration (ms): 16383.8 | learning rate: 4.466E-05 | global batch size: 64 | lm loss: 6.410875E+00 | loss scale: 2048.0 | grad norm: 122622.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5724/ 159576 | consumed samples: 161584 | elapsed time per iteration (ms): 16762.9 | learning rate: 4.468E-05 | global batch size: 64 | lm loss: 6.426128E+00 | loss scale: 2048.0 | grad norm: 61346.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5725/ 159576 | consumed samples: 161648 | elapsed time per iteration (ms): 16455.6 | learning rate: 4.470E-05 | global batch size: 64 | lm loss: 6.440339E+00 | loss scale: 2048.0 | grad norm: 114917.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5726/ 159576 | consumed samples: 161712 | elapsed time per iteration (ms): 16491.5 | learning rate: 4.472E-05 | global batch size: 64 | lm loss: 6.229801E+00 | loss scale: 2048.0 | grad norm: 43861.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5727/ 159576 | consumed samples: 161776 | elapsed time per iteration (ms): 16434.9 | learning rate: 4.473E-05 | global batch size: 64 | lm loss: 6.503794E+00 | loss scale: 2048.0 | grad norm: 59176.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5728/ 159576 | consumed samples: 161840 | elapsed time per iteration (ms): 16686.0 | learning rate: 4.475E-05 | global batch size: 64 | lm loss: 6.415756E+00 | loss scale: 2048.0 | grad norm: 62124.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5729/ 159576 | consumed samples: 161904 | elapsed time per iteration (ms): 16403.6 | learning rate: 4.477E-05 | global batch size: 64 | lm loss: 6.457495E+00 | loss scale: 2048.0 | grad norm: 56507.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5730/ 159576 | consumed samples: 161968 | elapsed time per iteration (ms): 16426.6 | learning rate: 4.479E-05 | global batch size: 64 | lm loss: 6.469141E+00 | loss scale: 2048.0 | grad norm: 61746.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5731/ 159576 | consumed samples: 162032 | elapsed time per iteration (ms): 16455.5 | learning rate: 4.480E-05 | global batch size: 64 | lm loss: 6.459309E+00 | loss scale: 2048.0 | grad norm: 59449.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5732/ 159576 | consumed samples: 162096 | elapsed time per iteration (ms): 16649.1 | learning rate: 4.482E-05 | global batch size: 64 | lm loss: 6.402276E+00 | loss scale: 2048.0 | grad norm: 46335.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5733/ 159576 | consumed samples: 162160 | elapsed time per iteration (ms): 16461.8 | learning rate: 4.484E-05 | global batch size: 64 | lm loss: 6.519283E+00 | loss scale: 2048.0 | grad norm: 66042.113 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5734/ 159576 | consumed samples: 162224 | elapsed time per iteration (ms): 16320.8 | learning rate: 4.486E-05 | global batch size: 64 | lm loss: 6.357197E+00 | loss scale: 2048.0 | grad norm: 86317.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5735/ 159576 | consumed samples: 162288 | elapsed time per iteration (ms): 16817.7 | learning rate: 4.488E-05 | global batch size: 64 | lm loss: 6.412820E+00 | loss scale: 2048.0 | grad norm: 68051.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5736/ 159576 | consumed samples: 162352 | elapsed time per iteration (ms): 16374.0 | learning rate: 4.489E-05 | global batch size: 64 | lm loss: 6.409474E+00 | loss scale: 2048.0 | grad norm: 52474.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5737/ 159576 | consumed samples: 162416 | elapsed time per iteration (ms): 16279.5 | learning rate: 4.491E-05 | global batch size: 64 | lm loss: 6.432059E+00 | loss scale: 2048.0 | grad norm: 60932.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5738/ 159576 | consumed samples: 162480 | elapsed time per iteration (ms): 16405.5 | learning rate: 4.493E-05 | global batch size: 64 | lm loss: 6.389083E+00 | loss scale: 2048.0 | grad norm: 97554.805 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5739/ 159576 | consumed samples: 162544 | elapsed time per iteration (ms): 16881.2 | learning rate: 4.495E-05 | global batch size: 64 | lm loss: 6.352797E+00 | loss scale: 2048.0 | grad norm: 56410.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5740/ 159576 | consumed samples: 162608 | elapsed time per iteration (ms): 16465.8 | learning rate: 4.496E-05 | global batch size: 64 | lm loss: 6.400247E+00 | loss scale: 2048.0 | grad norm: 67543.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5741/ 159576 | consumed samples: 162672 | elapsed time per iteration (ms): 16430.8 | learning rate: 4.498E-05 | global batch size: 64 | lm loss: 6.361669E+00 | loss scale: 2048.0 | grad norm: 49133.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5742/ 159576 | consumed samples: 162736 | elapsed time per iteration (ms): 16371.1 | learning rate: 4.500E-05 | global batch size: 64 | lm loss: 6.415005E+00 | loss scale: 2048.0 | grad norm: 84089.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5743/ 159576 | consumed samples: 162800 | elapsed time per iteration (ms): 16700.6 | learning rate: 4.502E-05 | global batch size: 64 | lm loss: 6.365685E+00 | loss scale: 2048.0 | grad norm: 51630.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5744/ 159576 | consumed samples: 162864 | elapsed time per iteration (ms): 16325.3 | learning rate: 4.504E-05 | global batch size: 64 | lm loss: 6.440388E+00 | loss scale: 2048.0 | grad norm: 72309.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5745/ 159576 | consumed samples: 162928 | elapsed time per iteration (ms): 16329.9 | learning rate: 4.505E-05 | global batch size: 64 | lm loss: 6.466510E+00 | loss scale: 2048.0 | grad norm: 42690.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5746/ 159576 | consumed samples: 162992 | elapsed time per iteration (ms): 16621.4 | learning rate: 4.507E-05 | global batch size: 64 | lm loss: 6.487222E+00 | loss scale: 2048.0 | grad norm: 71804.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5747/ 159576 | consumed samples: 163056 | elapsed time per iteration (ms): 16495.0 | learning rate: 4.509E-05 | global batch size: 64 | lm loss: 6.362286E+00 | loss scale: 2048.0 | grad norm: 86678.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5748/ 159576 | consumed samples: 163120 | elapsed time per iteration (ms): 16346.4 | learning rate: 4.511E-05 | global batch size: 64 | lm loss: 6.356483E+00 | loss scale: 2048.0 | grad norm: 59964.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5749/ 159576 | consumed samples: 163184 | elapsed time per iteration (ms): 16441.6 | learning rate: 4.512E-05 | global batch size: 64 | lm loss: 6.417390E+00 | loss scale: 2048.0 | grad norm: 50380.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5750/ 159576 | consumed samples: 163248 | elapsed time per iteration (ms): 16658.5 | learning rate: 4.514E-05 | global batch size: 64 | lm loss: 6.274541E+00 | loss scale: 2048.0 | grad norm: 39059.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5751/ 159576 | consumed samples: 163312 | elapsed time per iteration (ms): 16405.5 | learning rate: 4.516E-05 | global batch size: 64 | lm loss: 6.367218E+00 | loss scale: 2048.0 | grad norm: 51183.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5752/ 159576 | consumed samples: 163376 | elapsed time per iteration (ms): 16320.2 | learning rate: 4.518E-05 | global batch size: 64 | lm loss: 6.344701E+00 | loss scale: 2048.0 | grad norm: 36962.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5753/ 159576 | consumed samples: 163440 | elapsed time per iteration (ms): 16390.0 | learning rate: 4.520E-05 | global batch size: 64 | lm loss: 6.400953E+00 | loss scale: 2048.0 | grad norm: 66022.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5754/ 159576 | consumed samples: 163504 | elapsed time per iteration (ms): 16546.1 | learning rate: 4.521E-05 | global batch size: 64 | lm loss: 6.378292E+00 | loss scale: 2048.0 | grad norm: 51492.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5755/ 159576 | consumed samples: 163568 | elapsed time per iteration (ms): 16433.9 | learning rate: 4.523E-05 | global batch size: 64 | lm loss: 6.447009E+00 | loss scale: 2048.0 | grad norm: 67150.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5756/ 159576 | consumed samples: 163632 | elapsed time per iteration (ms): 16359.3 | learning rate: 4.525E-05 | global batch size: 64 | lm loss: 6.393310E+00 | loss scale: 2048.0 | grad norm: 47124.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5757/ 159576 | consumed samples: 163696 | elapsed time per iteration (ms): 16714.1 | learning rate: 4.527E-05 | global batch size: 64 | lm loss: 6.428847E+00 | loss scale: 2048.0 | grad norm: 73984.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5758/ 159576 | consumed samples: 163760 | elapsed time per iteration (ms): 16285.5 | learning rate: 4.528E-05 | global batch size: 64 | lm loss: 6.410369E+00 | loss scale: 2048.0 | grad norm: 51894.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5759/ 159576 | consumed samples: 163824 | elapsed time per iteration (ms): 16346.5 | learning rate: 4.530E-05 | global batch size: 64 | lm loss: 6.361977E+00 | loss scale: 2048.0 | grad norm: 46022.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5760/ 159576 | consumed samples: 163888 | elapsed time per iteration (ms): 16363.4 | learning rate: 4.532E-05 | global batch size: 64 | lm loss: 6.411450E+00 | loss scale: 2048.0 | grad norm: 62804.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5761/ 159576 | consumed samples: 163952 | elapsed time per iteration (ms): 16576.6 | learning rate: 4.534E-05 | global batch size: 64 | lm loss: 6.492290E+00 | loss scale: 2048.0 | grad norm: 91376.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5762/ 159576 | consumed samples: 164016 | elapsed time per iteration (ms): 16429.0 | learning rate: 4.536E-05 | global batch size: 64 | lm loss: 6.351690E+00 | loss scale: 2048.0 | grad norm: 56460.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5763/ 159576 | consumed samples: 164080 | elapsed time per iteration (ms): 16419.8 | learning rate: 4.537E-05 | global batch size: 64 | lm loss: 6.388021E+00 | loss scale: 2048.0 | grad norm: 48184.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5764/ 159576 | consumed samples: 164144 | elapsed time per iteration (ms): 16346.0 | learning rate: 4.539E-05 | global batch size: 64 | lm loss: 6.500803E+00 | loss scale: 2048.0 | grad norm: 47702.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5765/ 159576 | consumed samples: 164208 | elapsed time per iteration (ms): 16601.8 | learning rate: 4.541E-05 | global batch size: 64 | lm loss: 6.377601E+00 | loss scale: 2048.0 | grad norm: 52558.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5766/ 159576 | consumed samples: 164272 | elapsed time per iteration (ms): 16306.8 | learning rate: 4.543E-05 | global batch size: 64 | lm loss: 6.348913E+00 | loss scale: 2048.0 | grad norm: 75335.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5767/ 159576 | consumed samples: 164336 | elapsed time per iteration (ms): 16391.8 | learning rate: 4.544E-05 | global batch size: 64 | lm loss: 6.287434E+00 | loss scale: 2048.0 | grad norm: 51886.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5768/ 159576 | consumed samples: 164400 | elapsed time per iteration (ms): 16644.5 | learning rate: 4.546E-05 | global batch size: 64 | lm loss: 6.409395E+00 | loss scale: 2048.0 | grad norm: 59368.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5769/ 159576 | consumed samples: 164464 | elapsed time per iteration (ms): 16355.1 | learning rate: 4.548E-05 | global batch size: 64 | lm loss: 6.376360E+00 | loss scale: 2048.0 | grad norm: 45775.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5770/ 159576 | consumed samples: 164528 | elapsed time per iteration (ms): 16317.3 | learning rate: 4.550E-05 | global batch size: 64 | lm loss: 6.428416E+00 | loss scale: 2048.0 | grad norm: 53234.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5771/ 159576 | consumed samples: 164592 | elapsed time per iteration (ms): 16327.7 | learning rate: 4.551E-05 | global batch size: 64 | lm loss: 6.374567E+00 | loss scale: 2048.0 | grad norm: 44963.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5772/ 159576 | consumed samples: 164656 | elapsed time per iteration (ms): 16674.7 | learning rate: 4.553E-05 | global batch size: 64 | lm loss: 6.357097E+00 | loss scale: 2048.0 | grad norm: 47484.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5773/ 159576 | consumed samples: 164720 | elapsed time per iteration (ms): 16463.9 | learning rate: 4.555E-05 | global batch size: 64 | lm loss: 6.398357E+00 | loss scale: 2048.0 | grad norm: 41638.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5774/ 159576 | consumed samples: 164784 | elapsed time per iteration (ms): 16348.7 | learning rate: 4.557E-05 | global batch size: 64 | lm loss: 6.351582E+00 | loss scale: 2048.0 | grad norm: 54903.850 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5775/ 159576 | consumed samples: 164848 | elapsed time per iteration (ms): 16736.5 | learning rate: 4.559E-05 | global batch size: 64 | lm loss: 6.367338E+00 | loss scale: 2048.0 | grad norm: 43171.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5776/ 159576 | consumed samples: 164912 | elapsed time per iteration (ms): 16420.4 | learning rate: 4.560E-05 | global batch size: 64 | lm loss: 6.386267E+00 | loss scale: 2048.0 | grad norm: 68637.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5777/ 159576 | consumed samples: 164976 | elapsed time per iteration (ms): 16467.1 | learning rate: 4.562E-05 | global batch size: 64 | lm loss: 6.368368E+00 | loss scale: 2048.0 | grad norm: 47557.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5778/ 159576 | consumed samples: 165040 | elapsed time per iteration (ms): 16383.6 | learning rate: 4.564E-05 | global batch size: 64 | lm loss: 6.360928E+00 | loss scale: 2048.0 | grad norm: 48661.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5779/ 159576 | consumed samples: 165104 | elapsed time per iteration (ms): 16795.3 | learning rate: 4.566E-05 | global batch size: 64 | lm loss: 6.286585E+00 | loss scale: 2048.0 | grad norm: 41957.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5780/ 159576 | consumed samples: 165168 | elapsed time per iteration (ms): 16414.6 | learning rate: 4.567E-05 | global batch size: 64 | lm loss: 6.329445E+00 | loss scale: 2048.0 | grad norm: 58532.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5781/ 159576 | consumed samples: 165232 | elapsed time per iteration (ms): 16413.2 | learning rate: 4.569E-05 | global batch size: 64 | lm loss: 6.447413E+00 | loss scale: 2048.0 | grad norm: 58971.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5782/ 159576 | consumed samples: 165296 | elapsed time per iteration (ms): 16345.1 | learning rate: 4.571E-05 | global batch size: 64 | lm loss: 6.367276E+00 | loss scale: 2048.0 | grad norm: 62853.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5783/ 159576 | consumed samples: 165360 | elapsed time per iteration (ms): 16700.8 | learning rate: 4.573E-05 | global batch size: 64 | lm loss: 6.394166E+00 | loss scale: 2048.0 | grad norm: 104426.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5784/ 159576 | consumed samples: 165424 | elapsed time per iteration (ms): 16276.5 | learning rate: 4.575E-05 | global batch size: 64 | lm loss: 6.447882E+00 | loss scale: 2048.0 | grad norm: 50564.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5785/ 159576 | consumed samples: 165488 | elapsed time per iteration (ms): 16423.7 | learning rate: 4.576E-05 | global batch size: 64 | lm loss: 6.341421E+00 | loss scale: 2048.0 | grad norm: 126331.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5786/ 159576 | consumed samples: 165552 | elapsed time per iteration (ms): 16792.0 | learning rate: 4.578E-05 | global batch size: 64 | lm loss: 6.384687E+00 | loss scale: 2048.0 | grad norm: 54058.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5787/ 159576 | consumed samples: 165616 | elapsed time per iteration (ms): 16388.2 | learning rate: 4.580E-05 | global batch size: 64 | lm loss: 6.392807E+00 | loss scale: 2048.0 | grad norm: 59371.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5788/ 159576 | consumed samples: 165680 | elapsed time per iteration (ms): 16392.6 | learning rate: 4.582E-05 | global batch size: 64 | lm loss: 6.457485E+00 | loss scale: 2048.0 | grad norm: 65736.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5789/ 159576 | consumed samples: 165744 | elapsed time per iteration (ms): 16338.9 | learning rate: 4.583E-05 | global batch size: 64 | lm loss: 6.370594E+00 | loss scale: 2048.0 | grad norm: 86846.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5790/ 159576 | consumed samples: 165808 | elapsed time per iteration (ms): 16857.0 | learning rate: 4.585E-05 | global batch size: 64 | lm loss: 6.412526E+00 | loss scale: 2048.0 | grad norm: 77325.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5791/ 159576 | consumed samples: 165872 | elapsed time per iteration (ms): 16398.4 | learning rate: 4.587E-05 | global batch size: 64 | lm loss: 6.412295E+00 | loss scale: 2048.0 | grad norm: 50166.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5792/ 159576 | consumed samples: 165936 | elapsed time per iteration (ms): 16290.5 | learning rate: 4.589E-05 | global batch size: 64 | lm loss: 6.380277E+00 | loss scale: 2048.0 | grad norm: 48226.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5793/ 159576 | consumed samples: 166000 | elapsed time per iteration (ms): 16371.0 | learning rate: 4.591E-05 | global batch size: 64 | lm loss: 6.359699E+00 | loss scale: 2048.0 | grad norm: 65168.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5794/ 159576 | consumed samples: 166064 | elapsed time per iteration (ms): 16645.3 | learning rate: 4.592E-05 | global batch size: 64 | lm loss: 6.321030E+00 | loss scale: 2048.0 | grad norm: 52186.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5795/ 159576 | consumed samples: 166128 | elapsed time per iteration (ms): 16469.4 | learning rate: 4.594E-05 | global batch size: 64 | lm loss: 6.393083E+00 | loss scale: 2048.0 | grad norm: 55272.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5796/ 159576 | consumed samples: 166192 | elapsed time per iteration (ms): 16425.9 | learning rate: 4.596E-05 | global batch size: 64 | lm loss: 6.374780E+00 | loss scale: 2048.0 | grad norm: 53939.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5797/ 159576 | consumed samples: 166256 | elapsed time per iteration (ms): 16770.7 | learning rate: 4.598E-05 | global batch size: 64 | lm loss: 6.376060E+00 | loss scale: 2048.0 | grad norm: 62276.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5798/ 159576 | consumed samples: 166320 | elapsed time per iteration (ms): 16339.0 | learning rate: 4.599E-05 | global batch size: 64 | lm loss: 6.463357E+00 | loss scale: 2048.0 | grad norm: 55276.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5799/ 159576 | consumed samples: 166384 | elapsed time per iteration (ms): 16400.6 | learning rate: 4.601E-05 | global batch size: 64 | lm loss: 6.364144E+00 | loss scale: 2048.0 | grad norm: 46941.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5800/ 159576 | consumed samples: 166448 | elapsed time per iteration (ms): 16328.3 | learning rate: 4.603E-05 | global batch size: 64 | lm loss: 6.412081E+00 | loss scale: 2048.0 | grad norm: 61281.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5801/ 159576 | consumed samples: 166512 | elapsed time per iteration (ms): 16791.0 | learning rate: 4.605E-05 | global batch size: 64 | lm loss: 6.396990E+00 | loss scale: 2048.0 | grad norm: 90543.167 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5802/ 159576 | consumed samples: 166576 | elapsed time per iteration (ms): 16555.9 | learning rate: 4.607E-05 | global batch size: 64 | lm loss: 6.358585E+00 | loss scale: 2048.0 | grad norm: 43097.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5803/ 159576 | consumed samples: 166640 | elapsed time per iteration (ms): 16465.5 | learning rate: 4.608E-05 | global batch size: 64 | lm loss: 6.493999E+00 | loss scale: 2048.0 | grad norm: 45567.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5804/ 159576 | consumed samples: 166704 | elapsed time per iteration (ms): 16436.4 | learning rate: 4.610E-05 | global batch size: 64 | lm loss: 6.533109E+00 | loss scale: 2048.0 | grad norm: 127288.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5805/ 159576 | consumed samples: 166768 | elapsed time per iteration (ms): 16549.3 | learning rate: 4.612E-05 | global batch size: 64 | lm loss: 6.379089E+00 | loss scale: 2048.0 | grad norm: 48002.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5806/ 159576 | consumed samples: 166832 | elapsed time per iteration (ms): 16407.1 | learning rate: 4.614E-05 | global batch size: 64 | lm loss: 6.365424E+00 | loss scale: 2048.0 | grad norm: 49891.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5807/ 159576 | consumed samples: 166896 | elapsed time per iteration (ms): 16379.2 | learning rate: 4.615E-05 | global batch size: 64 | lm loss: 6.476014E+00 | loss scale: 2048.0 | grad norm: 47532.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5808/ 159576 | consumed samples: 166960 | elapsed time per iteration (ms): 16753.6 | learning rate: 4.617E-05 | global batch size: 64 | lm loss: 6.354483E+00 | loss scale: 2048.0 | grad norm: 56392.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5809/ 159576 | consumed samples: 167024 | elapsed time per iteration (ms): 16393.4 | learning rate: 4.619E-05 | global batch size: 64 | lm loss: 6.519560E+00 | loss scale: 2048.0 | grad norm: 44344.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5810/ 159576 | consumed samples: 167088 | elapsed time per iteration (ms): 16492.5 | learning rate: 4.621E-05 | global batch size: 64 | lm loss: 6.408142E+00 | loss scale: 2048.0 | grad norm: 49620.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5811/ 159576 | consumed samples: 167152 | elapsed time per iteration (ms): 16428.1 | learning rate: 4.622E-05 | global batch size: 64 | lm loss: 6.376643E+00 | loss scale: 2048.0 | grad norm: 54930.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5812/ 159576 | consumed samples: 167216 | elapsed time per iteration (ms): 16603.5 | learning rate: 4.624E-05 | global batch size: 64 | lm loss: 6.446056E+00 | loss scale: 2048.0 | grad norm: 49991.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5813/ 159576 | consumed samples: 167280 | elapsed time per iteration (ms): 16423.7 | learning rate: 4.626E-05 | global batch size: 64 | lm loss: 6.503972E+00 | loss scale: 2048.0 | grad norm: 48324.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5814/ 159576 | consumed samples: 167344 | elapsed time per iteration (ms): 16392.6 | learning rate: 4.628E-05 | global batch size: 64 | lm loss: 6.483917E+00 | loss scale: 2048.0 | grad norm: 49344.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5815/ 159576 | consumed samples: 167408 | elapsed time per iteration (ms): 16437.6 | learning rate: 4.630E-05 | global batch size: 64 | lm loss: 6.359298E+00 | loss scale: 2048.0 | grad norm: 46826.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5816/ 159576 | consumed samples: 167472 | elapsed time per iteration (ms): 16791.2 | learning rate: 4.631E-05 | global batch size: 64 | lm loss: 6.477077E+00 | loss scale: 2048.0 | grad norm: 80606.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5817/ 159576 | consumed samples: 167536 | elapsed time per iteration (ms): 16448.9 | learning rate: 4.633E-05 | global batch size: 64 | lm loss: 6.378170E+00 | loss scale: 2048.0 | grad norm: 50159.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5818/ 159576 | consumed samples: 167600 | elapsed time per iteration (ms): 16473.7 | learning rate: 4.635E-05 | global batch size: 64 | lm loss: 6.336848E+00 | loss scale: 2048.0 | grad norm: 68729.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5819/ 159576 | consumed samples: 167664 | elapsed time per iteration (ms): 16753.1 | learning rate: 4.637E-05 | global batch size: 64 | lm loss: 6.448166E+00 | loss scale: 2048.0 | grad norm: 53348.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5820/ 159576 | consumed samples: 167728 | elapsed time per iteration (ms): 16453.7 | learning rate: 4.638E-05 | global batch size: 64 | lm loss: 6.433999E+00 | loss scale: 2048.0 | grad norm: 56781.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5821/ 159576 | consumed samples: 167792 | elapsed time per iteration (ms): 16425.7 | learning rate: 4.640E-05 | global batch size: 64 | lm loss: 6.397796E+00 | loss scale: 2048.0 | grad norm: 51600.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5822/ 159576 | consumed samples: 167856 | elapsed time per iteration (ms): 16451.4 | learning rate: 4.642E-05 | global batch size: 64 | lm loss: 6.353134E+00 | loss scale: 2048.0 | grad norm: 49519.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5823/ 159576 | consumed samples: 167920 | elapsed time per iteration (ms): 16634.5 | learning rate: 4.644E-05 | global batch size: 64 | lm loss: 6.402969E+00 | loss scale: 2048.0 | grad norm: 52985.835 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5824/ 159576 | consumed samples: 167984 | elapsed time per iteration (ms): 16465.1 | learning rate: 4.646E-05 | global batch size: 64 | lm loss: 6.411339E+00 | loss scale: 2048.0 | grad norm: 61452.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5825/ 159576 | consumed samples: 168048 | elapsed time per iteration (ms): 16470.5 | learning rate: 4.647E-05 | global batch size: 64 | lm loss: 6.406009E+00 | loss scale: 2048.0 | grad norm: 58959.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5826/ 159576 | consumed samples: 168112 | elapsed time per iteration (ms): 16992.0 | learning rate: 4.649E-05 | global batch size: 64 | lm loss: 6.401187E+00 | loss scale: 2048.0 | grad norm: 61581.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5827/ 159576 | consumed samples: 168176 | elapsed time per iteration (ms): 16478.9 | learning rate: 4.651E-05 | global batch size: 64 | lm loss: 6.463509E+00 | loss scale: 2048.0 | grad norm: 62591.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 5827 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-25 01:42:45,269] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step5827/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 5827 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 18278.37 [exiting program after 1190.1195526679357 minutes] datetime: 2021-09-25 01:42:58 ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown srun: Job step aborted: Waiting up to 62 seconds for job step to finish. Killing subprocess 48057 Killing subprocess 31112 Killing subprocess 48058 Killing subprocess 48059 Killing subprocess 48061 Killing subprocess 31113 Killing subprocess 31114 Killing subprocess 31115 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 77714 Killing subprocess 28453 Killing subprocess 38924 Killing subprocess 54762 Killing subprocess 77715 Killing subprocess 77716 Killing subprocess 77717 Main process received SIGTERM, exiting Killing subprocess 22919 Killing subprocess 28454 Killing subprocess 28455 Killing subprocess 28456 Main process received SIGTERM, exiting Killing subprocess 54763 Killing subprocess 54764 Killing subprocess 54765 Killing subprocess 38925 Killing subprocess 38926 Killing subprocess 38927 Killing subprocess 22920 Main process received SIGTERM, exiting Killing subprocess 22921 Killing subprocess 22923 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 19021 Killing subprocess 45366 Killing subprocess 19022 Killing subprocess 19023 Killing subprocess 19024 Main process received SIGTERM, exiting Killing subprocess 45367 Killing subprocess 45368 Killing subprocess 45370 Main process received SIGTERM, exiting Killing subprocess 48440 Killing subprocess 81370 Killing subprocess 48441 Killing subprocess 48442 Killing subprocess 48443 Main process received SIGTERM, exiting Killing subprocess 81371 Killing subprocess 81372 Killing subprocess 81373 Main process received SIGTERM, exiting Killing subprocess 65810 Killing subprocess 65811 Killing subprocess 65812 Killing subprocess 65813 Main process received SIGTERM, exiting [2021-09-25 02:08:19] PULSE: tr8-104B is waiting to be scheduled (1165978_[1-10%1] on 'gpu_p13' partition) ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ninjaninja .................................... [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- ninjaninja .................................... [OKAY] [OKAY]-------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible op name-------------------------------------------------- ................ installed op name.. compatible................ --------------------------------------------------installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ......fused_adam [OKAY]............. [NO] ....... [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] ....... .......[OKAY] cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_lamb ............. [NO] ....... [OKAY]fused_adam fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] ............. [NO] ....... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attn fused_lamb............ [NO]............. ....... [NO][OKAY] ....... transformer[OKAY] ............ [NO] ....... [OKAY] transformer transformer............ [NO]............ .......[NO] [OKAY]....... sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformer . [NO] sparse_attn....... [OKAY]............ [OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] transformer ............transformer [NO]............ .......[NO] [OKAY]....... [NO] ....... [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. cpu_adam[OKAY] ............... --------------------------------------------------[YES] ......op name [OKAY]................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... cpu_adam[OKAY] ............... [YES] ......fused_lamb [OKAY]............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... sparse_attn[OKAY] ............ [NO] fused_lamb....... .............[OKAY] [NO] .......transformer [OKAY]............ [NO] ....... [OKAY] stochastic_transformer . [NO]sparse_attn ................... [OKAY][NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- sparse_attnop name ............................ installed[NO] ......... compatible [OKAY]-------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ...............stochastic_transformer [YES] ....... [OKAY][NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op nameop name ................ op name................ ................ installed installed................ installed....installed compatiblecompatible.. .. -------------------------------------------------- compatiblecompatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam............... cpu_adam ............... ............... [YES] [YES]............... [YES] ............ [YES] ...... [OKAY] [OKAY]...... [OKAY] [OKAY] fused_adamfused_adam fused_adamfused_adam............. .......................................[NO] [NO][NO][NO]....... ....... .............. [OKAY][OKAY] [OKAY][OKAY] fused_lamb fused_lamb.............fused_lambfused_lamb ............. .............[NO] ............. [NO][NO]....... [NO] ....... [OKAY]....... [OKAY]....... [OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attnsparse_attn sparse_attn........................[OKAY] ............[NO][NO] transformer [NO]....... ....... [OKAY]............[OKAY]....... [NO] [OKAY]transformer transformer................... transformer ............[OKAY][NO] [NO]................... [NO][OKAY]stochastic_transformer....... .......[OKAY] . stochastic_transformer[OKAY] [NO] .stochastic_transformer....... stochastic_transformer[NO] [OKAY]........ .[NO][OKAY] .......[NO] [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name ................op name................................ installed installed ................installed .. .. ..compatibleinstalledcompatible compatible -------------------------------------------------- --------------------------------------------------.. -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... [YES]............... [YES] ...... [YES] ...... [OKAY] ......cpu_adam [OKAY] [OKAY]............... [YES] ...... [OKAY]fused_adam fused_adam............. .............fused_adam[NO] [NO].................... .......[NO][OKAY] [OKAY]....... [OKAY]fused_lamb .............fused_lamb fused_adam[NO]fused_lamb............. .................................[NO] [OKAY][NO] ....... [NO] .......[OKAY] .......[OKAY] [OKAY] fused_lambsparse_attn .........................sparse_attn [NO][NO] ................... sparse_attn[NO] ....... [OKAY] ...................[OKAY] [OKAY]transformer[NO] ................... transformer[NO][OKAY] ............ ....... [NO][OKAY] .......transformer [OKAY]............ stochastic_transformer[NO] ........ stochastic_transformersparse_attn [OKAY] [NO] .................... [NO][OKAY]stochastic_transformer[NO] ....... .......[OKAY]. [OKAY][NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninja ninja ...................................................... ..................[OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installed installedinstalledinstalled .. .... .. compatiblecompatiblecompatible compatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ............... ............... [YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adamfused_adam .............fused_adam............. [NO].............[NO] .......[NO]....... [OKAY].......[OKAY] [OKAY] fused_lambfused_lamb .............fused_lamb............. [NO].............[NO]fused_adam [NO]........................... [OKAY] ....... [OKAY] [OKAY] [NO] ....... [OKAY] fused_lambsparse_attn sparse_attn............ ............[NO]sparse_attn .......[NO]......................... [NO][NO].......[OKAY] ..............[OKAY] transformer [OKAY] [OKAY]............transformer [NO]............ transformer ....... [NO] ............[OKAY]....... [NO][OKAY] .......stochastic_transformer [OKAY] stochastic_transformer. [NO] .stochastic_transformer....... [NO][OKAY] ........ [OKAY][NO] sparse_attn....... [OKAY] ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name op name................................................ installed installed................ installed .. installed.. .. ..compatible compatible compatible--------------------------------------------------compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam...............[YES] ............... ...............[YES]......[YES] ......[YES][OKAY]...... [OKAY][OKAY] ...... [OKAY] fused_adam fused_adam.............fused_adam [NO]............. fused_adam ....... .............[NO].............[OKAY] [NO].......[NO] .......[OKAY]....... fused_lamb [OKAY][OKAY] ............. [NO]fused_lamb fused_lambfused_lamb....... ..........................[OKAY] ............. [NO] [NO] [NO] .............. .......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attnsparse_attn transformer .................................... [NO][NO]............[NO] ..............[NO]....... [OKAY].......[OKAY] [OKAY] transformertransformer[OKAY] ............transformer............ [NO]............[NO]stochastic_transformer ..............[NO] [OKAY]. [OKAY] .......[NO] stochastic_transformer[OKAY]....... stochastic_transformer [OKAY]. .stochastic_transformer[NO] [NO]........ ....... [NO] [OKAY] [OKAY] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------op nameop nameop name ................................................ installedop name installed ..installed................ .. .. installedcompatible compatiblecompatible --------------------------------------------------..---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam cpu_adam ...............[YES] ..................... ............... [YES] [YES][YES][OKAY] ...... ............ [OKAY][OKAY][OKAY] fused_adam ............. [NO]fused_adam fused_adam....... fused_adam ..........................[OKAY] .............[NO] [NO] [NO].......fused_lamb [OKAY]........................... [NO][OKAY][OKAY] fused_lamb ....... fused_lamb.............[OKAY] fused_lamb [NO] ................................. [NO][OKAY][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............sparse_attn sparse_attn[NO]............transformer ....... [NO]............ ............ [OKAY] .......[NO] [NO] [OKAY] ....... transformer....... transformer............[OKAY] [NO]............[OKAY] ....... stochastic_transformer[OKAY][NO]transformer .................... [NO]stochastic_transformer[NO] [OKAY] ............... [NO][OKAY] [OKAY]stochastic_transformer ....... [OKAY]. stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name-------------------------------------------------- op name ................ ................ installed................installed op name .... installed ................compatible compatible .. installed ---------------------------------------------------------------------------------------------------- compatible .. compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] cpu_adam cpu_adam...... ...... ...............[OKAY][OKAY]............... [YES] [YES]...... ......[OKAY] [OKAY] fused_adam ............. [NO]fused_adam .................... fused_adam[NO][OKAY] fused_adam ....... ..........................[OKAY]fused_lamb [NO][NO]............. fused_lamb .......[NO] .............. ............. [OKAY][OKAY][OKAY] [NO] ....... [OKAY]fused_lamb fused_lamb .......................... [NO]sparse_attn[NO] .......................... [OKAY][NO]sparse_attn[OKAY] ................... [NO][OKAY] ....... [OKAY]transformer ............ transformer[NO] sparse_attn...................sparse_attn [OKAY][NO]........................ .......[NO][NO] [OKAY]stochastic_transformer.............. [OKAY].[OKAY] stochastic_transformer [NO]transformer transformer. ....... ............ [NO]............ [OKAY] [NO] [NO] ....... ..............[OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer . .[NO] .......[NO] [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ninja .................. .................. .................................... [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op nameop name................ ................ ................ installedinstalled................ installed.. .. installed compatible compatible.. .. -------------------------------------------------- --------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam..................... cpu_adam[YES]...............[OKAY] ............... ...... [YES] ......[YES] [OKAY] [OKAY]fused_adam...... .............[OKAY] [NO] ....... [OKAY] fused_lambfused_adamfused_adam fused_adam ....................................... ............. [NO] [NO][NO][NO]....... .............. .......[OKAY][OKAY] [OKAY][OKAY] fused_lambfused_lamb fused_lamb.......................... .............[NO][NO] .......[NO]....... sparse_attn[OKAY].......[OKAY] [OKAY]............ [NO] ....... [OKAY] transformer ............ [NO]sparse_attn sparse_attn ....... sparse_attn............ ............ [OKAY] [NO]............ [NO] ..............[NO] stochastic_transformer [OKAY] [OKAY] ....... . [OKAY]transformer[NO] transformer ................... [NO]transformer............ [OKAY] ....... ............ [NO] [OKAY] [NO] ....... .......[OKAY] [OKAY]stochastic_transformer stochastic_transformer. stochastic_transformer [NO]. ........[NO] [OKAY][NO]....... .......[OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY] [OKAY]-------------------------------------------------- [OKAY] ---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- op name op name................ op name ................ ................installed ................ installed.. installed ..installed compatible .. .. compatible-------------------------------------------------- compatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] cpu_adam......cpu_adamcpu_adam ...............[OKAY]............... ............... [YES] [YES][YES]...... ............[OKAY] [OKAY][OKAY]fused_adam ............. [NO] ....... [OKAY] fused_adamfused_lamb .......................... fused_adam[NO][NO]fused_adam ........................... ............. [NO] [OKAY] [NO] [OKAY] .............. [OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[NO] .............sparse_attn.................... [NO] ............[OKAY] [NO]....... [NO] [OKAY].............. [OKAY][OKAY] transformer sparse_attn............ ............[NO] [NO]....... .......[OKAY] sparse_attn[OKAY] sparse_attn............ transformerstochastic_transformer............[NO] ...................[NO]. [NO].......[NO][OKAY] .......[OKAY]....... [OKAY][OKAY] transformertransformer ........................stochastic_transformer [NO] [NO]....... . ....... [OKAY] [NO] [OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer .. [NO][NO] ....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................ ................................ installed................installed installed installed.. .. .. compatible.. compatible compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam............... cpu_adam ...............[YES] ............... ............... ...... [YES][YES] [YES]...... [OKAY] ............[OKAY] [OKAY][OKAY] fused_adam fused_adamfused_adam.............fused_adam ..........................[NO]............. [NO] ....... [NO][NO] ....... [OKAY][OKAY].............. [OKAY][OKAY] fused_lamb fused_lamb ..........................fused_lambfused_lamb [NO][NO].......................... [NO]....... .......[NO] ....... [OKAY] [OKAY]....... [OKAY] [OKAY] sparse_attn sparse_attn............ sparse_attnsparse_attn ............ [NO]............ [NO]...................[NO] [OKAY].......[NO]....... [OKAY][OKAY]....... transformer transformer[OKAY]transformer............ ........................[NO] transformer[NO] [NO] .......................... .......[NO][OKAY] [OKAY] [OKAY] ....... [OKAY]stochastic_transformer stochastic_transformer stochastic_transformer ..stochastic_transformer . [NO][NO] . [NO][NO].............. ..............[OKAY][OKAY] [OKAY][OKAY] ninjaninjaninjaninja ...................................................... [OKAY] [OKAY].................. [OKAY] ---------------------------------------------------------------------------------------------------- [OKAY] -------------------------------------------------- op name op name--------------------------------------------------op name ................ ................................op name installedinstalled................installed ....installed.. compatiblecompatiblecompatible .. -------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam ............... [YES] cpu_adam[YES]............... ...... ...... ............... [OKAY] [YES][OKAY][YES] ...... ......[OKAY] [OKAY] fused_adam fused_adam............. .............[NO] [NO]fused_adamfused_adam....... ....................[OKAY]............. [OKAY][NO] [NO]fused_lamb....... ....................fused_lamb [OKAY][NO] ....................fused_lamb[OKAY] [OKAY].............[NO] fused_lamb[NO]....... .............[OKAY]....... [NO][OKAY] ....... sparse_attn[OKAY] ............ [NO] .......sparse_attn [OKAY]............ [NO]sparse_attn transformer ....... ........................[OKAY] sparse_attn [NO] [NO] .......transformer....... ........................ [OKAY][OKAY][NO][NO] transformer.............. stochastic_transformer............[OKAY][OKAY] .[NO] stochastic_transformertransformer[NO]....... ....................[OKAY] [OKAY][NO] [NO] stochastic_transformer.............. [OKAY][OKAY]. [NO] ....... stochastic_transformer[OKAY] . [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................. .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name --------------------------------------------------................op name installedop name................ ................ .. ................ installedinstalledcompatible ..installed.. -------------------------------------------------- compatible compatible --------------------------------------------------.. --------------------------------------------------compatible -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... cpu_adam...............[YES] ............... ............... [YES]......[YES] [YES]............[OKAY] ......[OKAY][OKAY] [OKAY] fused_adam ............. [NO] fused_adam.......fused_adamfused_adam .............[OKAY].......................... [NO] [NO] [NO]fused_lamb ....... .............[OKAY].............. [NO][OKAY] [OKAY] fused_lamb .................... fused_lamb [OKAY]fused_lamb [NO] ............. ............. ....... [NO] [NO] [OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn sparse_attntransformer............sparse_attn ............[NO]............ ............[NO] .......[NO]....... [NO] [OKAY] ....... [OKAY]....... [OKAY][OKAY] transformer transformerstochastic_transformer ............transformer ............ [NO][NO] .................... [NO].......[OKAY][NO] ....... .......[OKAY][OKAY] [OKAY] stochastic_transformer .stochastic_transformerstochastic_transformer [NO] ......... [NO][NO][OKAY] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op nameop name ................ ................................................installed installedinstalled.. installed ....compatible.. compatible compatible-------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... cpu_adam ...............[YES]............... .....................[YES][YES] [OKAY]......[YES] ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adamfused_adam fused_lamb....................................... ............. [NO] [NO][NO] [NO] ............................ [OKAY] [OKAY][OKAY][OKAY] fused_lambfused_lamb fused_lamb.......................... .............[NO][NO] [NO].............. .......[OKAY][OKAY]sparse_attn [OKAY]............ [NO] ....... [OKAY] transformersparse_attn sparse_attn ............ ............sparse_attn ............ [NO] [NO][NO]................... ....... ....... [NO][OKAY][OKAY] .......[OKAY] [OKAY]stochastic_transformer transformer .............transformer [NO] [NO]transformer................... ...................[NO][OKAY] .......[NO][OKAY] stochastic_transformer [OKAY]....... [OKAY]. stochastic_transformer[NO] .......stochastic_transformer . [OKAY][NO] ........ [NO][OKAY] ....... [OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................ ................................ ................installedinstalled installedinstalled.. .. .. .. compatiblecompatible compatible compatible --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adamcpu_adamcpu_adamcpu_adam ............................................. ...............[YES] [YES][YES][YES]...... ..................[OKAY] [OKAY][OKAY][OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. fused_adam[NO]fused_adam fused_adam ............. ............. .......[NO] .............[NO] ....... [OKAY] [NO]....... [OKAY] utils .................. [YES] ...... [OKAY] .......[OKAY] fused_lamb [OKAY] fused_lamb quantizer .............. [NO] ....... [OKAY] ............. fused_lamb ............. fused_lamb.............[NO] [NO][NO].................... ....... .......[OKAY][NO] -------------------------------------------------- [OKAY][OKAY]....... [OKAY] sparse_attn ............ sparse_attn[NO] sparse_attn.......sparse_attn............ ............[NO][OKAY] ................... [NO][NO]transformer[OKAY] .......................... transformer[OKAY][OKAY][NO] .......transformer ............ [OKAY]............[NO]transformer [NO]................... ....... stochastic_transformer [OKAY] [NO][OKAY] ........ stochastic_transformer[OKAY][NO] stochastic_transformer ........ .[NO][OKAY] stochastic_transformer [NO] ....... ........[OKAY] [NO][OKAY] ....... [OKAY] ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY] --------------------------------------------------[OKAY]-------------------------------------------------- -------------------------------------------------- op nameop name ................--------------------------------------------------op name ................installed................ op nameinstalled.. installed ................ .. .. compatibleinstalled compatiblecompatible ..---------------------------------------------------------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam cpu_adam............... ............... .............................. [YES] [YES][YES][YES]...... ............ ......[OKAY][OKAY] [OKAY] [OKAY] fused_adamfused_adam fused_adamfused_adam............. .............[NO] .............[NO].................... [OKAY][NO]....... [NO] ....... [OKAY]....... [OKAY]fused_lamb[OKAY] .............fused_lamb fused_lamb[NO] fused_lamb ................................. .............[OKAY][NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] sparse_attnsparse_attn.......sparse_attn ........................[OKAY]............ [NO][NO][NO] .....................transformer [OKAY][OKAY][OKAY] ............ transformer[NO]transformertransformer ........................................... [NO] [NO][OKAY][NO] ....... ....... ....... [OKAY] stochastic_transformer[OKAY] [OKAY] . stochastic_transformer[NO]stochastic_transformerstochastic_transformer ....... ..[OKAY]. [NO][NO] [NO]....... ....... ....... [OKAY] [OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system transformer_inference .. [NO] ....... [OKAY] meet the required dependencies to JIT install the op.JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY] [OKAY] [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name async_io ............... [NO] ....... [NO] op name op name ................op name ................ ................ ................installedinstalled installed ..installed .. .. .. compatible compatiblecompatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] cpu_adam cpu_adamcpu_adam............... cpu_adam ............... ..............................[YES] [YES] [YES]......[YES] ............[OKAY] ...... [OKAY] [OKAY] quantizer .............. [NO] ....... [OKAY] [OKAY] -------------------------------------------------- fused_adamfused_adam .............fused_adam............. fused_adam [NO].............[NO] .......[NO] .................... [OKAY]....... [NO][OKAY][OKAY] fused_lamb ....... fused_lamb.............[OKAY] fused_lamb .............[NO] [NO]....... .............fused_lamb....... [NO] [OKAY] [OKAY]............. ....... [NO][OKAY] ....... [OKAY] sparse_attnsparse_attn ........................sparse_attn sparse_attn[NO][NO]............ ....... ................... [OKAY][NO][OKAY] [NO] transformer.............. ............ transformer[OKAY] [OKAY] [NO]............ .......[NO]transformertransformer [OKAY]............................... [NO][OKAY] [NO]stochastic_transformer ....... .......stochastic_transformer.[OKAY] [NO][OKAY]. ....... [NO]stochastic_transformer[OKAY]stochastic_transformer ....... .[OKAY]. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name -------------------------------------------------- op name op name................ op name................ ................installed installedinstalled.................. ..compatibleinstalled.. compatiblecompatible--------------------------------------------------.. ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam ............... .....................[YES]............... [YES] [YES] [OKAY]...... ............ [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] fused_adam .............fused_adam [NO]............. fused_lamb....... ............. [NO] .............[OKAY] [NO] ....... [NO] ....... fused_lamb .......[OKAY] [OKAY] ............. [OKAY] [NO] .......fused_lamb fused_lamb[OKAY] .......................... [NO][NO] .............. [OKAY][OKAY]sparse_attn ............ [NO] .......sparse_attn [OKAY]............ [NO] ....... sparse_attnsparse_attntransformer[OKAY] .................................... [NO] transformer[NO] [NO]................... ....... .......[NO] .......[OKAY][OKAY] [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformerstochastic_transformer transformer............stochastic_transformer. ............ [NO]. [NO] [NO] [NO]....... ....... ..............[OKAY][OKAY] [OKAY][OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer stochastic_transformer. [NO]. .......[NO] [OKAY]....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name op name ................................................ installed ................installed installed installed ...... ..compatiblecompatiblecompatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam ...............cpu_adam............... [YES] [YES] ............... [YES]...... ...... [YES]......[OKAY][OKAY] ...... [OKAY] [OKAY] fused_adam ............. fused_adam[NO] fused_adam............. fused_adam ....... .............[NO] ............. .......[OKAY][NO][NO] [OKAY].......fused_lamb....... [OKAY][OKAY].............fused_lamb [NO].............fused_lambfused_lamb [NO].................... .............[NO] .......[NO][OKAY] ....... [OKAY] ....... [OKAY] [OKAY] sparse_attn sparse_attn............sparse_attn sparse_attn........................[NO] ............[NO] [NO] .......[NO] ....... ....... .......[OKAY] [OKAY] [OKAY] [OKAY] transformer ............transformertransformer transformer ........................[NO]............ [NO] [NO] ....... [NO]....... ....... [OKAY]....... [OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformerstochastic_transformer .stochastic_transformer . [NO] . [NO]. ....... [NO][NO][OKAY]....... ..............[OKAY] [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name op name................ op name................op name installed installed.................. ................ .. compatibleinstalled installed compatible..-------------------------------------------------- ..--------------------------------------------------compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. [YES]cpu_adam[YES] cpu_adam ........................... [OKAY]...............[OKAY] [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. .............[NO] [NO]....... fused_adamfused_adam.......[OKAY] [OKAY]............. .............[NO] [NO]fused_lambfused_lamb....... ............. ............. [OKAY]....... [NO] [NO][OKAY]....... fused_lamb.......[OKAY] [OKAY]fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn sparse_attn............ ............[NO] [NO]....... sparse_attn.......sparse_attn[OKAY] ............[OKAY]............ transformer[NO][NO] transformer................... ................... [NO][OKAY] [OKAY] [NO] ....... transformer transformer [OKAY]....... ............ ............[OKAY][NO] stochastic_transformer [NO] ....... stochastic_transformer ........ [OKAY] .[NO] [OKAY][NO] .......stochastic_transformer ....... [OKAY] [OKAY]stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name................ ................ ................ ................installed installedinstalled installed .... .. ..compatiblecompatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam[YES] ............... .............................. ...... [YES][YES][OKAY] [YES]...... ...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. [NO]fused_adam fused_adam.................... fused_adam .............[NO][OKAY] .............[NO]....... [NO] fused_lamb.......[OKAY] ............. ....... [OKAY] [NO] fused_lamb [OKAY]fused_lamb ............. ....... [NO].............[OKAY] .......fused_lamb[NO] [OKAY].................... [NO][OKAY] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attntransformer sparse_attn.................................... [NO][NO]............ [NO] .......[NO]....... .......[OKAY][OKAY]....... [OKAY][OKAY] transformerstochastic_transformer transformer transformer............ .........................[NO] [NO] [NO] .......[NO] ....... [OKAY].............. [OKAY][OKAY] [OKAY] stochastic_transformer stochastic_transformer stochastic_transformer. ..[NO] [NO][NO]....... ..............[OKAY] [OKAY][OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... ....................................[OKAY][OKAY] [OKAY]--------------------------------------------------[OKAY]-------------------------------------------------- op name --------------------------------------------------................ --------------------------------------------------op name op nameinstalled .................................. op name installedcompatible installed ..................-------------------------------------------------- installed..compatible ..compatible -------------------------------------------------- compatible --------------------------------------------------cpu_adam -------------------------------------------------- ...............cpu_adam [YES]............... ......[YES] cpu_adam [OKAY]cpu_adam ...... -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system ...............[OKAY]............... [YES][YES] ............ [OKAY][OKAY] meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja fused_adam ............. [NO] fused_adam....... .............[OKAY] [NO]fused_adamfused_adam fused_lamb ....... .......................... ............. [OKAY][NO][NO][NO] ..............fused_lamb ....... [OKAY] [OKAY] .............[OKAY] [NO] fused_lamb .................... fused_lamb[OKAY] .............[NO]sparse_attn [NO]................... .......[NO][OKAY] .......[OKAY] [OKAY]sparse_attn ............ [NO] transformer....... ............[OKAY] [NO] sparse_attn.......sparse_attn transformer[OKAY] ............ ........................ [NO][NO]stochastic_transformer[NO] ............... ....... [OKAY][OKAY][OKAY][NO] .......stochastic_transformer transformer [OKAY] transformer ............. ............[NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op nameop name................ ................ ................ ................installed installed ..installed installed ..compatible ....--------------------------------------------------compatible compatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... cpu_adam............... cpu_adam[OKAY] [YES].............................. ......[YES][YES] ......[OKAY]...... fused_adam [OKAY][OKAY]............. [NO] ....... [OKAY] fused_adam .............fused_lamb [NO]fused_adam............. fused_adam .......[NO] ............. .................... [OKAY] [NO] [NO] [OKAY] ..............fused_lamb [OKAY][OKAY]............. [NO] fused_lamb.......fused_lambsparse_attn .............[OKAY]......................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn ............transformer [NO]............ .......[NO] [OKAY]....... [OKAY] sparse_attnsparse_attn transformerstochastic_transformer............ ........................ [NO] . [NO][NO] .............. [NO] .......[OKAY] [OKAY] [OKAY]....... [OKAY]stochastic_transformer transformer transformer ......................... [NO][NO][NO] ....... [OKAY] ninjaninjaninjaninja .................. ....................................[OKAY].................. .............. stochastic_transformer[OKAY][OKAY] [OKAY][OKAY]--------------------------------------------------[OKAY] --------------------------------------------------op name-------------------------------------------------- -------------------------------------------------- op name . [NO]stochastic_transformer ....... .[OKAY] [NO] ....... [OKAY] ................ ................op nameop nameinstalled ................ installed .................. installed .. compatible installed.. compatible --------------------------------------------------compatible--------------------------------------------------.. --------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES][YES] cpu_adam ..................... ...... ............... [OKAY][YES] [OKAY] ......[YES] [OKAY]...... [OKAY] fused_adamfused_adam .......................... [NO][NO] fused_adam.............. fused_adam............. [OKAY][OKAY].............[NO] .......[NO]fused_lamb [OKAY]fused_lamb.................... [NO]............. fused_lamb[OKAY].......[NO] [OKAY]............. ....... [NO][OKAY] .......fused_lamb [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ............. [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name sparse_attn ............sparse_attn [NO]............ .......[NO] sparse_attn [OKAY] ....... ............ [OKAY][NO]transformer op name ................................ op name................ installed installed..installed ................ .. compatible.. installed compatible -------------------------------------------------- sparse_attn...................transformer [OKAY]............[NO]............ .......[NO] [NO] transformer[OKAY] ....... ..compatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- ....... ............ [OKAY] stochastic_transformer [OKAY][NO] cpu_adam cpu_adam............... ...............cpu_adam[YES] [YES]......cpu_adam............... [OKAY] ......[YES] ........ stochastic_transformer transformer [NO][OKAY] .................... stochastic_transformer [NO][NO] [OKAY] ....... ............... [OKAY] ...... .[OKAY] ....... [NO] [OKAY]....... [OKAY] [YES] [OKAY] ...... [OKAY]fused_adam ............. [NO]fused_adam .................... fused_adam [OKAY] [NO] stochastic_transformer . [NO] ....... [OKAY] ............. .......[NO]fused_lamb fused_adam [OKAY]............. ....... [NO].............[OKAY] fused_lamb....... .............[NO]fused_lamb[OKAY] [NO] .................... .......[NO][OKAY] [OKAY] ....... [OKAY] sparse_attnfused_lamb ......................... [NO] [NO]....... .......sparse_attnsparse_attn[OKAY] ........................[OKAY] [NO][NO] transformer ....... ....... ............ [OKAY][OKAY][NO] ....... transformer[OKAY] transformer ............ sparse_attn............[NO]stochastic_transformer .......[NO] ............[OKAY]. .......[NO] [NO] [OKAY]stochastic_transformer ....... ........[OKAY] stochastic_transformer[OKAY][NO] ........ [NO][OKAY]transformer ................... [OKAY] [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op nameop name ................ ................................installed ................ installed installedinstalled.. ..compatible ....compatible-------------------------------------------------- compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ...............cpu_adam...... cpu_adam ............... [YES][OKAY] ............... [YES] ...... [YES] ...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adamfused_adam fused_adam .............fused_lamb.......................... [NO].............[NO] [NO] ....... .......[NO]....... [OKAY].......[OKAY][OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb fused_lambfused_lamb............. .............[NO]............. [NO] ....... [NO] ....... [OKAY]sparse_attn .......[OKAY]............ [OKAY][NO] async_io ............... [NO] ....... [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformersparse_attn ........................ sparse_attn [NO][NO]sparse_attn ...................................... [OKAY][OKAY][NO][NO] .............. transformer[OKAY][OKAY]stochastic_transformer utils .................. [YES] ...... [OKAY] ............. transformer[NO] transformer[NO]................... ................... [OKAY][NO] [OKAY] [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- .......stochastic_transformer .......[OKAY]. [OKAY][NO] stochastic_transformer....... [OKAY]stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name op name ................................ ................ ................ installedinstalled installed..installed.. compatible.... compatible --------------------------------------------------compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam[YES] cpu_adam............... ............... ......[YES] ............... [OKAY][YES] ...... [YES]...... [OKAY]......[OKAY] [OKAY] fused_adam ............. [NO] fused_adamfused_adam.......fused_adam .............[OKAY].......................... [NO][NO][NO]fused_lamb .................................. [NO][OKAY][OKAY] [OKAY]....... fused_lamb[OKAY]fused_lambfused_lamb ....................................... [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attn transformer sparse_attn............ ........................ ............[NO] [NO] [NO][NO]....... .....................[OKAY] [OKAY][OKAY] [OKAY]transformer transformer............ ............[NO] stochastic_transformertransformer [NO] ....... ............ ........ [OKAY] [NO][NO][OKAY] .............. stochastic_transformer [OKAY] [OKAY] stochastic_transformer . [NO]. stochastic_transformer....... [NO][OKAY] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report transformer_inference .. [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name................op name op name ................ installed ................................installed .. installed installed ..compatible .. compatible ..-------------------------------------------------- compatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adamcpu_adam............... [YES] .............................. [YES] ......[YES]......[YES] [OKAY] [OKAY]............ [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam .............fused_adam [NO].............fused_adamfused_adam [NO].................... [NO]............. .......[OKAY][NO]....... [OKAY].......[OKAY] fused_lamb async_io ............... [NO] ....... [NO] fused_lamb[OKAY].............fused_lamb .............[NO]............. [NO] fused_lamb .......[NO]....... [OKAY].............[OKAY]....... [NO][OKAY] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] sparse_attn sparse_attn............ sparse_attn............[NO] [NO]...................sparse_attn .......[NO][OKAY]............ [NO] [OKAY]....... quantizer .............. [NO] ....... [OKAY] ....... transformer [OKAY] [OKAY]............ -------------------------------------------------- transformer [NO]............ transformertransformer ....... [NO] ............ ............[OKAY] ....... [NO] [NO] [OKAY]stochastic_transformer ....... ........[OKAY] stochastic_transformer [OKAY] [NO] ........ stochastic_transformer[OKAY][NO]stochastic_transformer ......... [OKAY][NO] [NO] .............. [OKAY] [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name................op name op name................installed................ ..installed................installed ..compatibleinstalled.. compatible..compatible-------------------------------------------------- async_ioasync_io .............................. [NO][NO] .............. [NO][NO] compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] cpu_adam ............... [YES] ...... cpu_adamcpu_adam[OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] cpu_adam ............... ............... ............... [YES] [YES] [YES] ...... ...... ......fused_adam [OKAY] [OKAY] ............. [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- fused_lamb fused_adam.............fused_adam fused_adam[NO]............. .................................[NO] .......[NO][OKAY][NO] ..............[OKAY] [OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[NO] sparse_attn ............. ................................ [NO][NO][OKAY][NO] ....... ..............[OKAY] [OKAY][OKAY] transformer ............ sparse_attn[NO] ....... ............[OKAY] sparse_attnsparse_attn [NO] ............stochastic_transformer....... ............[NO] . [NO] [OKAY] [NO]....... ....... ....... [OKAY][OKAY]transformer[OKAY] ............transformer transformer............[NO] ............[NO] ....... [NO] ....... [OKAY].......[OKAY] [OKAY] stochastic_transformer stochastic_transformerstochastic_transformer. [NO]. ........[NO] [NO].......[OKAY] .......[OKAY] [OKAY] ninjaninjaninja ninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name--------------------------------------------------op name op name................op name................ installed................ ................ installedinstalled .. .. installed..compatible compatible .. --------------------------------------------------compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam ...... ............... .............................. [OKAY][YES] [YES] [YES]............ [OKAY]......[OKAY] [OKAY] fused_adam ............. fused_adam[NO] fused_adamfused_adam.................... ..........................[OKAY] [NO] [NO][NO]....... ..............[OKAY]fused_lamb [OKAY][OKAY]............. fused_lamb[NO] fused_lamb ............. ....... ............. fused_lamb[OKAY][NO] [NO] ............. ....... ....... [NO] [OKAY] [OKAY] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformersparse_attn ............sparse_attn ............ [NO]............ ............ [NO]....... [NO] [NO][OKAY] ....... ..............[OKAY] [OKAY][OKAY]stochastic_transformer transformer ............transformer. [NO]transformer [NO]........................ ....... .......[NO] [NO] [OKAY][OKAY] ....... .......[OKAY] [OKAY]stochastic_transformer stochastic_transformer . [NO]stochastic_transformer. .......[NO]. [OKAY][NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. ....................................[OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op nameop name ................op name................ installed................................ installed..installed installed .. compatible.. .. compatible --------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam............... ...... ............... ...............[YES] [OKAY] [YES] ...... [YES]...... [OKAY]......[OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adam fused_adam............. .............[NO]fused_lamb............. .......[NO].............[NO] [NO].......[OKAY] ....... ....... fused_lamb[OKAY][OKAY][OKAY] ............. [NO]fused_lambfused_lamb ....... ............. ............. [OKAY] [NO][NO]sparse_attn .......................... [OKAY][OKAY][NO] .......sparse_attn [OKAY]............ [NO] transformer....... ............[OKAY] sparse_attn[NO]sparse_attn transformer ........................................... [OKAY] [NO] [NO][NO] ..................... stochastic_transformer [OKAY] [OKAY][OKAY] . [NO]transformer stochastic_transformertransformer ................................ [OKAY][NO][NO] [NO]....... ..............[OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................................................................ installedinstalledinstalled installed ........ compatiblecompatiblecompatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam[YES]cpu_adam .................................... [OKAY]............... [YES][YES][YES] .................. [OKAY][OKAY]fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_adamfused_lamb fused_adam fused_adam.......................... [NO] ................................. [NO] [NO][NO][OKAY] ....... ....... ....... [OKAY] [OKAY] [OKAY] fused_lambfused_lambfused_lamb .......................... sparse_attn............. [NO] [NO][NO] ............ .............. .......[NO][OKAY] [OKAY][OKAY]....... [OKAY] transformer ............ [NO]sparse_attn ................... sparse_attn[NO] sparse_attn[OKAY] ....... ............ ............ [OKAY] [NO] [NO]stochastic_transformer transformer .......................... . [NO][OKAY][OKAY][NO] ....... ....... [OKAY]transformer[OKAY] transformer ........................ stochastic_transformer [NO] [NO] ............... [NO][OKAY] [OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer . . [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop name op name ................ ................op name ................ installed................ installed ..installed installed .. compatible ....compatible compatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam cpu_adam ..................... ............... ............... [OKAY][YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam fused_adam ............. fused_lamb............. ............. .............[NO][NO][NO] .............. [NO] .......[OKAY].......[OKAY] [OKAY][OKAY] fused_lamb fused_lamb.............fused_lamb [NO].......................... .......[NO][NO] [OKAY].......sparse_attn....... [OKAY]............[OKAY] [NO] ....... [OKAY] transformer ............ sparse_attn[NO] sparse_attn sparse_attn............ ....... ........................[NO][OKAY] [NO].......[NO] [OKAY].......stochastic_transformer....... [OKAY][OKAY]transformer . ............transformer[NO] transformer ................... [NO]............ [NO] [OKAY]....... [NO] ....... [OKAY] ....... [OKAY] [OKAY] stochastic_transformer .stochastic_transformerstochastic_transformer [NO] ......... [NO][NO][OKAY] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatibleninja ---------------------------------------------------------------------------------------------------- ..................ninja [OKAY] .................. [OKAY]cpu_adam-------------------------------------------------- cpu_adam ............... -------------------------------------------------- op name............... [YES] ................ op name[YES] ...... installed ......................[OKAY] [OKAY]..installed ..compatible compatible -------------------------------------------------- --------------------------------------------------fused_adam ............. fused_adam[NO] .................... [OKAY][NO] cpu_adam....... ...............cpu_adam[OKAY]fused_lamb [YES] ............................ ...... [YES]fused_lamb[NO] [OKAY]............. .............[OKAY][OKAY] [NO] ....... [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY].............sparse_attn [NO]............sparse_attn fused_lamb ....... ............[NO] ............. [OKAY] .......[NO] [NO] [OKAY]..............fused_lamb [OKAY].............[OKAY]transformer [NO] ...................transformer [NO][OKAY]............ .......[NO] [OKAY].......sparse_attn [OKAY]............ [NO]stochastic_transformer .......stochastic_transformer . [OKAY]sparse_attn .[NO]............ [NO]transformer....... [NO]...................[OKAY] .......[OKAY] [NO] [OKAY] ....... [OKAY]transformer ............ [NO] stochastic_transformer....... [OKAY] . [NO]stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system async_io ............... [NO] ....... [NO] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled ...... .. compatiblecompatiblecompatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at cpu_adamcpu_adam cpu_adam ............... cpu_adam............... ............... [YES] [YES] .....................[YES] ......[YES]...... [OKAY] runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja [OKAY][OKAY] ...... [OKAY] fused_adam ............. [NO]fused_adam .......fused_adam............. [OKAY].............[NO]fused_adam [NO]fused_lamb.................... .................... [OKAY] [OKAY][NO] [NO]....... fused_lamb[OKAY].......fused_lamb .............[OKAY]............. [NO]fused_lamb[NO] ........................... [NO][OKAY][OKAY] sparse_attn....... [OKAY]............ [NO] -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system ....... sparse_attnsparse_attn[OKAY] meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at ........................ [NO]transformer[NO] .......................... [OKAY][OKAY][NO] sparse_attn runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ....... transformer[OKAY]............transformer ............ [NO] stochastic_transformer[NO] ............ .............. . [NO] [OKAY] [OKAY][NO]....... .......[OKAY] [OKAY] stochastic_transformer transformer .stochastic_transformer ............ [NO] [NO]........ ....... [OKAY] [NO] ....... [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name................ op nameop name ................ installed................................installed .. installed ..installedcompatible compatible ....-------------------------------------------------- -------------------------------------------------- compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES] cpu_adam[YES]............... ..................... ...... [YES][OKAY] [YES] ......[OKAY] ......[OKAY] [OKAY] fused_adam ............. [NO]fused_adam .................... fused_adamfused_adam [OKAY] [NO] ............. .................... [NO][NO][OKAY]fused_lamb ........................... fused_lamb[NO] [OKAY][OKAY] .................... [NO]fused_lambfused_lamb [OKAY] .................... ............. [OKAY] [NO] [NO] .............. [OKAY][OKAY] sparse_attn ............sparse_attn ............[NO]sparse_attnsparse_attn [NO]................... ....... [OKAY]............[OKAY] [NO] [NO]transformer transformer....... ....... ............ ............[OKAY] [OKAY][NO] [NO] .......transformer ....... transformer [OKAY]............ [OKAY] ............ [NO]stochastic_transformer [NO].......stochastic_transformer ......... [NO][OKAY] [OKAY] [NO] ....... .......stochastic_transformer[OKAY] stochastic_transformer[OKAY] . [NO]. .......[NO] [OKAY]....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name-------------------------------------------------- op name ................ op name ................ op nameinstalled................installed ..................installed.. compatibleinstalledcompatible .. ..----------------------------------------------------------------------------------------------------compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES]cpu_adam cpu_adam [YES]...... .....................[OKAY]............... [OKAY][YES][YES] ............ [OKAY][OKAY] fused_adam .............fused_adam [NO]............. .......[NO] fused_adam [OKAY]fused_adam ....... ............. ............. [OKAY]fused_lamb [NO] [NO] ............. ....... .......fused_lamb [NO][OKAY]............. [OKAY] ....... [NO] [OKAY]....... fused_lamb fused_lamb [OKAY] .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] .......transformer sparse_attn[OKAY]sparse_attn ............ ............ ............transformer [NO] [NO][NO]............ ....... .............. [OKAY] [NO] [OKAY] [OKAY]....... stochastic_transformer [OKAY]transformertransformer . ........................[NO] stochastic_transformer [NO] [NO]....... ........ ....... [OKAY][OKAY] [NO] [OKAY]....... stochastic_transformer[OKAY] stochastic_transformer. [NO]. .......[NO] [OKAY]....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................................................ installedinstalled installed installed ........ compatiblecompatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adamcpu_adam............... ..............................cpu_adam [YES] ............... [YES][YES]......[YES] ...... ...... ......[OKAY] [OKAY][OKAY] [OKAY] fused_adamfused_adam fused_adam..........................fused_adam .............[NO] [NO] ............. [NO] ....... .............. [NO] [OKAY][OKAY] [OKAY]....... [OKAY]fused_lambfused_lamb fused_lamb ..........................fused_lamb .............[NO] ............. [NO] .......[NO] [NO]....... .......[OKAY] [OKAY] .......[OKAY] [OKAY] sparse_attn ............sparse_attn sparse_attn [NO] ............sparse_attn ............ .......[NO] ............[NO][OKAY] .......[NO]....... transformer [OKAY]....... [OKAY] ............ [OKAY] [NO] transformertransformertransformer ....... ........................ ............[OKAY][NO] [NO] [NO] ....... ....... ....... [OKAY]stochastic_transformer [OKAY] [OKAY] . stochastic_transformer[NO] stochastic_transformerstochastic_transformer........ [OKAY] .[NO]. [NO].......[NO] ....... [OKAY][OKAY]....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. op name op nameop name ................ ................ ................ ................installedinstalledinstalled ..installed.... compatible compatible.. compatible -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam ...............[YES]............... .....................[YES][YES] [OKAY]......[YES] ...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. fused_adam[NO] fused_adam.............fused_adam .............[NO] ....... ............. [NO]....... [OKAY] [NO] [OKAY]....... .......fused_lamb[OKAY] fused_lamb.............[OKAY] fused_lamb[NO]............. fused_lamb.......[NO] ............. .............[OKAY] .......[NO][NO] [OKAY].............. [OKAY][OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............sparse_attnsparse_attn transformer ............[NO] [NO]............ ............ ....... [NO]....... [NO] [OKAY][OKAY] ....... async_io [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............... ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ....... [OKAY]stochastic_transformertransformer[OKAY] [NO] ....... [NO] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name................................op name ................installed................installed installed..installed.. .... compatiblecompatible compatible compatible-------------------------------------------------- ............ .transformertransformer[NO] ............[NO]................... .......[NO][NO][OKAY] [OKAY].............. [OKAY]stochastic_transformer[OKAY] transformer_inferenceasync_io ................. [NO][NO] .............. [OKAY][NO] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- utils .................. [YES] ...... [OKAY]transformer_inference cpu_adamcpu_adam cpu_adamcpu_adam............... ............... ............... [YES]............... [YES] [YES]...... [YES] [OKAY] ............ ...... [OKAY][OKAY] [OKAY] .stochastic_transformer stochastic_transformer[NO]. ....... .[NO][OKAY] [NO]....... .......[OKAY] [OKAY] .. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] fused_adam ............. [NO] fused_adam....... .............fused_adam[OKAY]fused_adam utils --------------------------------------------------.................. [NO] ............. ............. ....... fused_lamb[NO] [NO]............. [OKAY]....... ....... [NO][OKAY] fused_lamb.......[OKAY] [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb[OKAY]............. -------------------------------------------------- fused_lamb.............[NO] .............[NO]....... [NO].......[OKAY] [OKAY]....... sparse_attn [OKAY]............ [NO] ....... [OKAY] sparse_attnsparse_attn transformer ............ ............ ............sparse_attn [NO][NO][NO] ............ ..................... [NO][OKAY][OKAY] [OKAY] ....... transformer [OKAY]............transformer [NO]stochastic_transformer............transformer [NO]........ ............ .......[NO][OKAY][NO] ....... [OKAY] ....... [OKAY]stochastic_transformer [OKAY] stochastic_transformer. stochastic_transformer[NO]. ........[NO] [OKAY][NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... ...............[NO]async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. transformer_inference[NO] utils......... [NO][OKAY].................. .......[YES] [OKAY]...... [OKAY] utils .................. utilsquantizer[YES] ...................................... [YES][NO][OKAY] ............. [OKAY][OKAY] quantizer .............. quantizer[NO] -------------------------------------------------- .............. ....... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................................ ................ ................ installedinstalledinstalled ......installed compatiblecompatible compatible .. ------------------------------------------------------------------------------------------------------------------------------------------------------ compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam..............................cpu_adam [YES]..............................[YES] ...... [YES]...... [YES] [OKAY]......[OKAY]...... [OKAY][OKAY] fused_adam .............fused_adam [NO]fused_adamfused_adam............. .......[NO] ............. ............. [OKAY][NO].......[NO] [OKAY].............. fused_lamb[OKAY] fused_lamb[OKAY]............. .............fused_lamb[NO] [NO] .............fused_lamb ....... .......[NO]............. [OKAY][NO][OKAY]....... .......[OKAY] [OKAY] sparse_attnsparse_attn sparse_attn........................sparse_attn [NO][NO]........................ ....... ....... [NO][NO] [OKAY][OKAY] ....... ....... transformer[OKAY][OKAY] transformer ........................transformertransformer [NO][NO] ...................................... [NO][OKAY][OKAY][NO] .............. [OKAY]stochastic_transformer[OKAY] stochastic_transformer .stochastic_transformer. [NO]stochastic_transformer[NO] . ....... ....... .[NO] [OKAY][OKAY] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................ ................installedinstalled ................ .. ..installed installed compatible compatible.. .. --------------------------------------------------compatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES] cpu_adamcpu_adam ............... ...... ............... ...............[YES] [OKAY] [YES][YES] ...... ......[OKAY]...... [OKAY]fused_adam [OKAY] ............. [NO] ....... [OKAY] fused_adam fused_adam.............fused_lamb ..........................fused_adam[NO] [NO].............[NO]....... .......[NO]....... [OKAY] [OKAY] [OKAY] .......fused_lamb [OKAY] fused_lamb............. .............[NO]fused_lamb [NO]sparse_attn.................... ...................[NO] [OKAY] [NO] [OKAY]....... .......[OKAY] [OKAY] transformer sparse_attn............ sparse_attn............ [NO]............ sparse_attn.......[NO][NO] [OKAY].............. ............ [OKAY][OKAY][NO]stochastic_transformer ........ transformertransformer[OKAY] ............[NO]............ transformer .......[NO] [NO] ............ [OKAY].............. [NO][OKAY][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer ..stochastic_transformer [NO][NO] ............... [NO][OKAY] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninjaninjaninja ninja.................................... ..................[OKAY][OKAY].................. [OKAY]-------------------------------------------------- --------------------------------------------------[OKAY]-------------------------------------------------- op name op name................op name-------------------------------------------------- ................installed ................ op name..installedinstalled .. ................compatible .. compatible --------------------------------------------------installedcompatible -------------------------------------------------- ..--------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. compatible cpu_adam--------------------------------------------------cpu_adam async_io ............... [NO] ....... [NO] .............................. cpu_adam [YES] [YES] ..................... ...... [OKAY] cpu_adam[YES] [OKAY] ..................... [OKAY][YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam .............fused_adam [NO]............. .......[NO]fused_adam .......[OKAY] .............fused_adam[OKAY] utils .................. [YES] ...... [OKAY] fused_lamb[NO]fused_lamb............. .......................... ....... [NO][NO][NO] [OKAY]..................... [OKAY] [OKAY] [OKAY] fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] sparse_attn ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] ....... sparse_attn ............ ............[OKAY][NO] [NO]....... .......[OKAY] [OKAY]sparse_attn DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- ............transformer transformer ............[NO]............ sparse_attn [NO] .......[NO] ............ ....... .......[OKAY][OKAY][NO] [OKAY] meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja .......transformer [OKAY]stochastic_transformer............stochastic_transformer [NO].. [NO]transformer .......[NO] ................... ....... [OKAY][OKAY][OKAY][NO] ....... stochastic_transformer[OKAY] . [NO]stochastic_transformer ....... .[OKAY] [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- async_io transformer_inference............... ..[NO] [NO]....... .......[NO] JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system utils .................. [YES] ...... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference quantizer.. ..............[NO] [NO]....... .......[OKAY] [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name ................................op name................ installed................installedinstalled .. ..installedcompatible.. compatible..compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam ............... ............... cpu_adam[YES][YES] ........................... [YES][OKAY][OKAY] ......[YES] [OKAY] ...... [OKAY] fused_adam ............. fused_adam[NO] fused_adam ............. ....... ............. [NO] [OKAY] [NO]....... .......[OKAY] fused_lamb [OKAY] .............fused_adamfused_lamb fused_lamb[NO] .......................... ....... ............. [NO][NO] [OKAY] .......[NO]....... [OKAY][OKAY] ....... [OKAY] fused_lambsparse_attn ......................... [NO][NO] .......sparse_attnsparse_attn....... ........................ [NO][NO] [OKAY].......[OKAY]....... [OKAY][OKAY] transformer ............transformer [NO]transformer............ ....... ............[OKAY][NO]sparse_attn [NO]................... .......[OKAY]stochastic_transformer [OKAY] .[NO] stochastic_transformer [NO]....... stochastic_transformer [OKAY]......... [OKAY] [NO]transformer [NO]................... .......[OKAY] [OKAY] [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY][OKAY]---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name ................ ................ op name................ installed installed ................ installed .. ..compatibleinstalled.. -------------------------------------------------- compatible.. compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adam[OKAY] cpu_adam cpu_adam ............... ...............[YES]............... [YES]......[YES]fused_adam [OKAY] ............ ............. [OKAY][OKAY][NO] ....... [OKAY] fused_adam ............. fused_lamb[NO] .............fused_adam....... [NO]fused_adam.............[OKAY] ....... [NO][OKAY] .............fused_lamb....... [NO].............[OKAY] .......[NO] sparse_attn[OKAY] fused_lamb....... ............ fused_lamb[NO].............[OKAY] ....................[NO] [OKAY] [NO] ....... transformer....... [OKAY] ............ [OKAY] sparse_attn[NO] ................... [OKAY][NO] ....... [OKAY] stochastic_transformer transformersparse_attn. ............ [NO]sparse_attn ............ .......[NO]............ ....... [NO][OKAY][OKAY][NO] .............. [OKAY][OKAY] stochastic_transformer transformertransformer. ........................[NO] [NO].......[NO] ..............[OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op name ................................op name ................installed................installed installed..installed.. compatible.... compatible --------------------------------------------------compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES] cpu_adam......cpu_adam cpu_adam[OKAY]............... ............... ...............[YES][YES] [YES] ...... ...... ...... [OKAY] [OKAY]fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_lambfused_adamfused_adam .............fused_adam............. ............. [NO] .............[NO] [NO] .............. .......[NO][OKAY][OKAY] ....... [OKAY] [OKAY]fused_lamb fused_lamb............. fused_lamb............. [NO] .............sparse_attn [NO]....... .......[NO][OKAY]............ [OKAY].......[NO] [OKAY]....... [OKAY] transformersparse_attn ........................ [NO][NO]sparse_attn .......sparse_attn............ ....... [OKAY][NO]............[OKAY] .......[NO] [OKAY]stochastic_transformer.......transformer [OKAY]............ . transformer [NO]transformer [NO] ................... ............ .......[NO] [OKAY] [NO][OKAY]....... stochastic_transformer.......[OKAY] [OKAY]. [NO]stochastic_transformer .......stochastic_transformer .[OKAY]. [NO][NO] .............. [OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................................................ installedinstalled installed ..installed .. .. compatible..compatiblecompatible ----------------------------------------------------------------------------------------------------compatible-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam cpu_adamcpu_adam .............................. cpu_adam ...............[YES] [YES] ............... [YES] ............ [YES][OKAY] ...... [OKAY] ...... [OKAY] [OKAY] async_io ............... [NO] ....... [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam fused_adam............. fused_adam .............[NO]fused_adam............. [NO] ....... [NO]............. ....... .......[OKAY][NO][OKAY] transformer_inference .. [NO] ....... [OKAY] [OKAY].......fused_lamb [OKAY]............. async_io async_io...............utils .................................[NO] [NO][YES]....... .............[NO] fused_lamb [NO]fused_lamb ............. .......fused_lamb............. [NO][OKAY] ............. [OKAY][NO] [NO]....... [NO].......[OKAY] .......[OKAY] [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer_inference-------------------------------------------------- transformer_inference.. ..[NO] [NO]....... .......[OKAY] sparse_attn sparse_attntransformer............ sparse_attn ........................ [NO] [NO]............ [NO] .............. [NO] ....... [OKAY][OKAY]....... [OKAY] [OKAY] [OKAY]transformerstochastic_transformer utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] transformer ............ transformer ............[NO]. ............[NO].......[NO] [NO] [OKAY]....... .............. [OKAY][OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- stochastic_transformer stochastic_transformer.stochastic_transformer [NO]. ........[NO] [OKAY][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op nameop name................ ................ installed................installed................ installed.... installed .. compatible compatible.. compatible -------------------------------------------------- --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adam...............cpu_adam ...............[OKAY][YES]............... [YES]......[YES] ......[OKAY]...... [OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam ............. [NO]fused_adamfused_lamb fused_adam .................... ............. [OKAY] .............[NO] [NO] [NO].............. fused_lamb ....... [OKAY][OKAY] ............. [OKAY] [NO] fused_lamb....... fused_lamb ............. [OKAY] .............[NO] sparse_attn.......[NO] [OKAY]................... [NO][OKAY] ....... [OKAY] sparse_attn ............transformer [NO]............ .......[NO] sparse_attn[OKAY]....... [OKAY]............sparse_attn transformer[NO]............ .......stochastic_transformer[NO] ............ [OKAY] ........[NO] [OKAY][NO]....... transformer.......[OKAY] ............transformer[OKAY] [NO]stochastic_transformer............ .......[NO] .[OKAY] [NO]....... .......[OKAY] stochastic_transformer[OKAY] .stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name ................op nameop nameop name installed................................................ ..installedinstalledinstalled .... compatible..compatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adamcpu_adam...............cpu_adam ..............................[YES]............... [YES][YES][YES] ...... .................. [OKAY] [OKAY][OKAY][OKAY] fused_adam ............. fused_adamfused_adamfused_adam[NO] ................................. ............. [OKAY] [NO] [NO] [NO]fused_lamb.............. .................... [OKAY] [OKAY] [OKAY][NO] .......fused_lamb [OKAY]fused_lambfused_lamb ....................................... [NO] [NO] [NO] ....... ....... .......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformersparse_attn sparse_attn............ sparse_attn ............ ............[NO] ...................[NO][NO] [NO][OKAY].............. [OKAY] ....... [OKAY] stochastic_transformer [OKAY] transformer.transformer [NO] transformer........................ .......[NO]............ [NO] [OKAY]....... [NO].......[OKAY] .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer .. .[NO][NO] .......[NO]....... [OKAY].......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] ....... [NO][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY] [OKAY] async_io ............... [NO] ....... [NO] utils ..................utils [YES].................. ......[YES] [OKAY]...... transformer_inference .. [NO] ....... [OKAY] [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. quantizer[NO] ....... [OKAY] .............. [NO] .......-------------------------------------------------- quantizer .............. [NO] ....... [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name................ op nameop name................ installed................................ installed .. installed..installed compatible compatible .. ..-------------------------------------------------- -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adamcpu_adam [YES]...............[YES]............... ...... [YES]...... [YES][OKAY] [OKAY] ...... ...... [OKAY][OKAY] fused_adam ............. fused_adam[NO] ....................fused_adam fused_adam [OKAY] [NO] .......................... [NO].......fused_lamb [NO] ....... ....................[OKAY][OKAY] [NO][OKAY] fused_lamb....... fused_lamb .............[OKAY].............fused_lamb [NO].............[NO] .......[NO]....... [OKAY][OKAY]....... sparse_attn [OKAY]............ [NO] ....... [OKAY] transformer sparse_attnsparse_attn............ ............ [NO] sparse_attn [NO]............ ....... ............ .......[NO][OKAY] [NO] ....... [OKAY].......stochastic_transformer[OKAY] [OKAY].transformer transformer [NO] ............transformer................... [NO] ............[NO] [OKAY] ..............[NO] ....... [OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformerstochastic_transformer. [NO]. ........[NO] [NO].......[OKAY] .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... -------------------------------------------------- [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name--------------------------------------------------op name op name ................................ op name installed................ installed ................ .. installed installed..compatible compatible....-------------------------------------------------- --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ..................... cpu_adam cpu_adam[OKAY] [YES] .................................... [YES] [YES][OKAY] ............fused_adam [OKAY].............[OKAY] [NO] ....... [OKAY]fused_adam ............. [NO]fused_adam fused_lamb ....... ............. .............fused_adam [OKAY] [NO][NO]............. fused_lamb ....... [NO]....... ............. [OKAY] ....... [OKAY][NO] [OKAY]....... [OKAY]fused_lamb fused_lamb .......................... [NO] [NO]....... sparse_attn....... ............[OKAY][OKAY] sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformer ............transformersparse_attn sparse_attn ........................[NO] [NO]...................[NO] [OKAY][NO].............. .......[OKAY][OKAY] stochastic_transformer [OKAY] .transformerstochastic_transformer [NO] transformer ............. ................... [NO] [OKAY][NO] [NO] ....... ..............[OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at -------------------------------------------------- runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY][YES] ...... [OKAY] async_ioasync_io .............................. [NO][NO] .............. [NO][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer ..............utils [NO].................. .......[YES] [OKAY]...... [OKAY] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- ....... [OKAY]-------------------------------------------------- -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................................... ..................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................op name................ op name installed installed ................ .................. .. installedinstalledcompatiblecompatible ..--------------------------------------------------..-------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... [YES]cpu_adam[OKAY]cpu_adam ...... ............... ............... [OKAY] [YES][YES] ............ [OKAY]fused_adam [OKAY] ............. [NO] .......fused_adam [OKAY]............. [NO] fused_adam.......fused_adamfused_lamb .............[OKAY].......................... [NO][NO][NO] .......fused_lamb.............. [OKAY] ............. [OKAY] [OKAY] [NO] .......fused_lamb fused_lamb [OKAY] ............. ............. [NO] sparse_attn [NO] ....... ............ ....... [OKAY] [NO] [OKAY] ....... sparse_attn[OKAY] ............ [NO] transformer....... ............[OKAY] [NO] .......sparse_attn transformersparse_attn [OKAY] ........................ ............ [NO][NO] stochastic_transformer[NO] .............. ........[OKAY][OKAY] [NO][OKAY] transformer stochastic_transformer....... transformer............[OKAY]. ............[NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninjaninjaninjaninja .................. .................. ....................................[OKAY] [OKAY][OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name op name................................................ installedinstalled installed.................. ....compatibleinstalled compatible..--------------------------------------------------compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adamcpu_adam ................................................... [YES][YES][OKAY][YES] ............ ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] .......fused_adam fused_adam fused_adam[OKAY]............. ..........................fused_lamb[NO] [NO]....................[NO] [NO] [OKAY].............. ....... [OKAY] [OKAY][OKAY] fused_lamb DeepSpeed general environment info: .............fused_lamb fused_lamb [NO]............. .................... [NO][NO][OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] sparse_attn .......................... [OKAY][OKAY][NO] torch version .................... 1.8.1 ....... [OKAY] sparse_attn ............transformer [NO]............ .......[NO] sparse_attnsparse_attn[OKAY] torch cuda version ............... 11.1 ....... ............[OKAY]transformer............ nvcc version ..................... 11.2 ............[NO][NO] stochastic_transformer ....... [NO]....... ........[OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] [OKAY][OKAY][NO] transformer ................... transformerstochastic_transformer[OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [NO]. ...................[NO] [NO][OKAY]....... .......[OKAY] [OKAY]stochastic_transformer . stochastic_transformer[NO] ........ [OKAY][NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda version torch version............... 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version deepspeed install path............... ........... 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info:deepspeed infonvcc version ................... .....................0.4.2+bc17042, bc17042, big-science 11.2deepspeed wheel compiled w. torch install path...... deepspeed install path ...............torch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info torch version................... .................... 0.4.2+bc17042, bc17042, big-science1.8.1 deepspeed wheel compiled w.torch cuda version ..................... 11.1torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO]async_io ...................... [NO][NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ....... [NO] torch version .................... 1.8.1 transformer_inference .. transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 torch version .................... 1.8.1 utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] torch cuda version ............... 11.1 quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] nvcc version ..................... 11.2 ---------------------------------------------------------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: -------------------------------------------------- torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... torch version1.8.1 .................... 1.8.1torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 ..................... deepspeed install path11.2 torch version .................... 1.8.1 ........... deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. utils[NO] ......................... [OKAY] [YES]-------------------------------------------------- ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY] [YES] ...... [OKAY] utils ..................quantizer [YES] .................... [NO][OKAY] ....... [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... -------------------------------------------------- [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................................... .................. .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................................installed................ installed ..installed installed .. compatible..compatible.. --------------------------------------------------compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam............... cpu_adam............... [YES]...............[YES]............... ......[YES] ...... ...... [YES] [OKAY][OKAY] [OKAY] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adamfused_adam .......................... fused_adam[NO] fused_adam[NO] ........................................ [OKAY][NO][NO][OKAY] async_io ............... [NO] ....... [NO] ..............fused_lamb [OKAY]fused_lamb[OKAY]............. .............[NO] fused_lamb[NO]fused_lamb....... [OKAY]................................. [NO][NO][OKAY] transformer_inference .. [NO] ....... [OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............ utils .................. [YES] ...... [OKAY] [NO]sparse_attn transformersparse_attn ....... ............ ........................[OKAY] [NO] [NO] transformer.......[NO] ..........................[OKAY] quantizer .............. [NO] ....... [OKAY] [OKAY][OKAY][NO] transformer -------------------------------------------------- ................... transformerstochastic_transformer[OKAY] [NO]............ ........stochastic_transformer [NO][NO]. [OKAY].......[NO]....... [OKAY]....... [OKAY]stochastic_transformer [OKAY] . stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO] .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .............................. [NO] [NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... async_io .......[OKAY] ...............[OKAY] [NO] ....... [NO] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] transformer_inferencequantizerquantizer .............................. [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name transformer_inference .. [NO] ....... [OKAY] op name op nameop name ................................ ................installed ................ installed installed.. installed .. compatible.. ..--------------------------------------------------compatiblecompatible compatible-------------------------------------------------- utils .................. [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] quantizer .............. [NO] ....... [OKAY] cpu_adam[YES]cpu_adam .................................... [OKAY][YES][YES] -------------------------------------------------- fused_adam............ .............[OKAY][OKAY] [NO] fused_adam .................... [OKAY][NO] ....... fused_lamb[OKAY] fused_adamfused_adam............. fused_lamb .......................... [NO] .............[NO]....... [NO] [NO].......[OKAY] .............. [OKAY] [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] sparse_attn....... sparse_attn...................[OKAY] ............ [NO][OKAY][NO] .............. [OKAY][OKAY] sparse_attntransformertransformer ........................ ............sparse_attn[NO][NO] [NO]................... .......[OKAY] .......[NO] [OKAY] [OKAY]stochastic_transformer ....... transformer [OKAY]............. stochastic_transformer [NO][NO] transformer ............... [OKAY] [OKAY]............ [NO] [NO]....... .......[OKAY]stochastic_transformer [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. . [NO]stochastic_transformer ....... [OKAY].  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. utils[NO] ......................... [YES][OKAY] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] async_io-------------------------------------------------- ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`... [NO] ....... [OKAY] utils .................. [YES] ......async_io [OKAY]............... [NO] ....... [NO]quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] utils utils.................. .................. [YES][YES] ............ [OKAY][OKAY] transformer_inference .. [NO] transformer_inference....... .. [OKAY][NO] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... [OKAY]quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] quantizer...... ..............[OKAY] [NO] ....... quantizer[OKAY] .............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizerutils ................................ [YES][NO] ............. [OKAY][OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES] ......[NO] [OKAY] ....... quantizer[OKAY] .............. [NO] ....... [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 torch version .................... 1.8.1 nvcc version ..................... 11.2 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. [NO] -------------------------------------------------- ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils ..................async_io [YES]............... ......[NO] [OKAY]....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference-------------------------------------------------- .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:nvcc version .....................DeepSpeed general environment info: 11.2 deepspeed install path ...........torch install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path............... deepspeed info............... ................... 0.4.2+bc17042, bc17042, big-science ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...... torch versiontorch 1.8, cuda 11.1 torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 DeepSpeed general environment info:DeepSpeed general environment info: nvcc version nvcc version..................... ..................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... DeepSpeed general environment info:11.211.2 deepspeed install path deepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info...............deepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...... ...... torch 1.8, cuda 11.1torch 1.8, cuda 11.1torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] async_io....... [NO]............... torch version .................... 1.8.1 torch cuda version ............... 11.1 [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO]utils ......................... [OKAY][YES] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ...... [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY]quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... -------------------------------------------------- JIT compiled ops requires ninja [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io transformer_inference............... ..[NO] [NO]....... .......[NO] [OKAY] utils .................. [YES] ...... [OKAY]transformer_inference .. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] utils ..................-------------------------------------------------- [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda version nvcc version............... .....................11.1 11.2 nvcc version deepspeed install path..................... ...........11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................. ..................[OKAY][OKAY].................. [OKAY]----------------------------------------------------------------------------------------------------[OKAY] --------------------------------------------------op nameop name --------------------------------------------------................................op name installedinstalled................op name .... ................ compatibleinstalled compatible installed..-------------------------------------------------- ..--------------------------------------------------compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... ......cpu_adam[YES]cpu_adam [OKAY]............... /bin/sh: line 0: type: git: not found ............... [YES][YES] .................. [OKAY][OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adam fused_lamb.......................... .............[NO]fused_adam[NO] [NO] ....... ....... .............[OKAY] ....... [NO][OKAY] [OKAY].......fused_lamb ............. [NO] [OKAY]fused_lamb....... .............[OKAY]fused_lamb sparse_attn[NO] ................................ [NO][OKAY][NO] ....... [OKAY]sparse_attn ............transformer .......[NO]............ .......[NO]sparse_attn[OKAY] ............ .......[OKAY] [NO][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer....... ............[OKAY] stochastic_transformer[NO] async_io ............... [NO] ....... [NO] transformer........ [NO]............[OKAY]sparse_attn .......[NO]............ stochastic_transformer [OKAY]....... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] [NO].[OKAY] .......[NO]stochastic_transformer [OKAY]....... utils .................. [YES] ...... [OKAY] .[OKAY]transformer quantizer .............. [NO] ....... [OKAY] [NO] ....... ............[OKAY] -------------------------------------------------- [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO] ....... [NO][NO] transformer_inference transformer_inference.. [NO] ....... ..[OKAY] [NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inferencetransformer_inference .... [NO][NO] ....... .......[OKAY] [OKAY] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] ---------------------------------------------------------------------------------------------------- [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install pathtorch install path ............................................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch versiontorch version ............................................................ 1.8.11.8.11.8.1 torch cuda versiontorch cuda versiontorch cuda version ............................................. 11.111.111.1 nvcc versionnvcc versionnvcc version ............................................................... 11.211.211.2 deepspeed install pathdeepspeed install pathdeepspeed install path ................................. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed infodeepspeed info ......................................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. deepspeed wheel compiled w. ...... ...... ...... torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] async_io....... ...............[NO] [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version ....................torch cuda version ...............1.8.1 11.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. utils[NO] ......................... [YES][OKAY] ...... [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_ioasync_io .............................. [NO][NO] ....... .......[NO] [NO] torch version .................... 1.8.1 torch cuda version ............... 11.1 transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] nvcc version ..................... 11.2 utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... 11.2deepspeed info ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ................. torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info..................... ...................11.2 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version torch cuda version.................... ...............1.8.1 11.1 async_io ............... [NO] ....... [NO] torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path transformer_inference .. [NO] ....... [OKAY] deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch versiontorch version ........................................ 1.8.11.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ...............async_io [NO] ...................... [NO][NO] async_io ............... [NO] .......transformer_inference [NO].. ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [NO] ....... [OKAY] DeepSpeed general environment info: [OKAY] utils ..................transformer_inference [YES].. ......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer .............. [NO] utils....... ..................[OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] torch version .................... 1.8.1 [YES] ...... [OKAY]-------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 ---------------------------------------------------------------------------------------------------- quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... DeepSpeed general environment info:1.8.1 torch cuda version ............... 11.1 torch install pathnvcc version .................................... 11.2 deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info torch version................... ....................0.4.2+bc17042, bc17042, big-science 1.8.1deepspeed wheel compiled w. ......torch cuda version torch 1.8, cuda 11.1............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:nvcc version ..................... 11.2 deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...............deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version torch version.................... ....................1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 1.8.1 torch cuda version torch cuda version............... ...............11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science 11.1nvcc version .....................nvcc version 11.2..................... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version ............... torch version11.1 .................... nvcc version1.8.1 ..................... DeepSpeed general environment info: 11.2torch cuda version ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer ..............  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils async_io.................. [YES]............... ......[NO] [OKAY]....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc versionnvcc version .......................................... 11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 /bin/sh: line 0: type: git: not found torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... torch install path1.8.1 ............... torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version ..................... torch version11.2 ....................deepspeed install path 1.8.1........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...............deepspeed info 11.1................... nvcc version0.4.2+bc17042, bc17042, big-science ..................... deepspeed wheel compiled w.11.2 ...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 torch version .................... 1.8.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch cuda version ............... 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version torch cuda version............... ............... 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version ....................torch cuda version ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc versionDeepSpeed general environment info: ..................... 11.2 deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found ...............deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... 11.2..................... 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.1 torch versionnvcc version ......................................... 1.8.111.2 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc version torch install path..................... 11.2............... deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ................... torch version0.4.2+bc17042, bc17042, big-science ....................deepspeed wheel compiled w. 1.8.1...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: torch install pathtorch install path .............................. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch versiontorch version ........................................ torch version 1.8.1 1.8.1 .................... torch cuda version1.8.1torch cuda version .............................. torch cuda version 11.1 11.1 ............... nvcc version nvcc version 11.1 ..................... ..................... nvcc version 11.2 11.2 ..................... deepspeed install path deepspeed install path 11.2 ........... ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................................... deepspeed info 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science ................... deepspeed wheel compiled w.deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science ...... ......deepspeed wheel compiled w.torch 1.8, cuda 11.1 torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... 1.8.1torch cuda version ............... torch cuda version11.1 ............... nvcc version11.1 ..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ...... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_adam ............. [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info: fused_lamb ............. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install pathtorch install path torch install path.............................. ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version torch version........................................ ....................1.8.11.8.1 1.8.1 torch cuda versiontorch cuda version torch cuda version ............... ..............................11.1 11.111.1nvcc version sparse_attn ............ [NO] ....... [OKAY] nvcc versionnvcc version..................... ..........................................11.2 11.211.2deepspeed install path deepspeed install pathdeepspeed install path........... ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed infodeepspeed info................... ...................0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.......deepspeed wheel compiled w. ......torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ...... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path torch version............... .................... 1.8.1 torch cuda version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... 11.1 torch versionnvcc version ......................................... 1.8.111.2 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1 nvcc versiontorch cuda version ..................... ...............11.2 11.1deepspeed install path ...........nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... deepspeed info11.2 ................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed install path deepspeed wheel compiled w............ ...... torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1torch cuda version /bin/sh: line 0: type: git: not found ...............nvcc version 11.1..................... 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... 1.8.1torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 ..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info:deepspeed info ................... deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ......torch install path torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. -------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info DeepSpeed general environment info:................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version ............... torch version11.1 ....................nvcc version 1.8.1..................... 11.2torch cuda version deepspeed install path............... ...........11.1 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] .....................deepspeed info 11.2................... deepspeed install path0.4.2+bc17042, bc17042, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY] [OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name op name ................................ op name ................installed installed installed .................. .... compatible installed compatiblecompatible -------------------------------------------------- .. ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adam .....................cpu_adam [YES]..............................[OKAY] ......[YES] [YES]......[OKAY] ......[OKAY] [OKAY]fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam .............fused_lambfused_adam .............[NO].......................... ....... [NO][NO] [NO] .......[OKAY]....... .......[OKAY] [OKAY] fused_lamb[OKAY] /bin/sh: line 0: type: git: not found .............fused_lamb fused_lamb [NO] ................................. [NO][NO][OKAY] /bin/sh: line 0: type: git: not found sparse_attn....... ....... ............ [OKAY] [OKAY] [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attntransformer ........................ [NO][NO] .............. sparse_attn sparse_attn[OKAY][OKAY] ........................ stochastic_transformer[NO] transformer [NO] ....... ............. .......[NO][OKAY][NO] [OKAY].............. [OKAY]transformer[OKAY] ............ transformer stochastic_transformer [NO] .................... [NO][NO][OKAY] .............. [OKAY] stochastic_transformer[OKAY] . [NO] stochastic_transformer....... [OKAY] . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 using torch.float16 for parameters ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ------------------------ arguments ------------------------ **** Git info for Megatron: git_hash=unknown git_branch=unknown **** accumulate_allreduce_grads_in_fp32 .............. False **** Git info for Megatron: git_hash=unknown git_branch=unknown **** adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1185609.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 110 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 10 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ... /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info: torch version .................... 1.8.1 torch install pathtorch cuda version .............................. 11.1 nvcc version ..................... 11.2 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed install path ........... torch version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] .................... deepspeed info1.8.1 ................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version ...............deepspeed wheel compiled w. 11.1...... torch 1.8, cuda 11.1nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version DeepSpeed general environment info:.................... 1.8.1 torch cuda version ............... 11.1torch install path nvcc version............... ..................... 11.2 deepspeed install path ...........['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch versiondeepspeed info ....................................... 1.8.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.torch cuda version ..................... torch 1.8, cuda 11.111.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer quantizer.............. ..............[NO] [NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] ......[OKAY] [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ... > setting codecarbon ... -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op name op name................................ ................installed................installed installed installed .. ...... compatible compatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adamcpu_adamcpu_adam ............................................................ [YES][YES][YES] [YES] ............ ...... ...... [OKAY][OKAY] [OKAY] [OKAY] fused_adamfused_adam fused_adamfused_adam.......................... .............[NO].............[NO] [NO]....... [NO] ....... [OKAY]....... ....... [OKAY][OKAY][OKAY] fused_lamb fused_lambfused_lambfused_lamb ............. ..........................[NO]............. [NO] [NO]..............[NO] [OKAY] .......[OKAY] ....... [OKAY][OKAY] sparse_attnsparse_attn ............sparse_attn sparse_attn ............[NO]........................ [NO][NO].......[NO] [OKAY] .............. ....... [OKAY][OKAY][OKAY] transformer transformertransformer............transformer ....................................[NO] [NO].......[NO][NO] ..............[OKAY] ....... [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformer.stochastic_transformerstochastic_transformer [NO] ........ .[NO].[OKAY] .......[NO][NO] [OKAY].............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- > initializing torch distributed ... DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformer transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] transformer_inference....... ..[NO] [NO] ....... [OKAY] utils ..................transformer_inference [YES].. ......[NO] [OKAY]....... [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name--------------------------------------------------................ op name ................installedop name................ installed..................installed compatible....installed compatible..--------------------------------------------------compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............cpu_adam[YES]cpu_adam [YES]..................... ............... ......[YES] [OKAY] [YES]......[OKAY] [OKAY]...... [OKAY] fused_adam ............. [NO] fused_adamfused_adam....... fused_adam .............[OKAY] ............. .............[NO] fused_lamb [NO].............[NO]....... ....... .......[NO] [OKAY] [OKAY] ....... [OKAY] [OKAY]fused_lamb fused_lamb.............fused_lamb .............[NO]............. [NO] ....... [NO] ....... [OKAY] sparse_attn.......[OKAY] ............[OKAY] [NO] ....... [OKAY] transformer ............ [NO]sparse_attn sparse_attn sparse_attn................... [OKAY]............[NO]............ [NO].......[NO] stochastic_transformer .......[OKAY] ....... .[OKAY]transformer [OKAY] ............ transformer[NO] transformer [NO]....... ................... ............[OKAY][OKAY][NO] [NO]....... .......[OKAY]stochastic_transformer [OKAY] . stochastic_transformer[NO] .......stochastic_transformer. [OKAY][NO] ........ [NO][OKAY] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`............... [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... [OKAY].................. [OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop nameop name ................op name ................ ................ installed installed................installed.. ....compatibleinstalled compatiblecompatible--------------------------------------------------.. ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES]............... cpu_adam......[YES]............... .....................[YES][OKAY] [OKAY][YES]...... ......[OKAY] [OKAY] fused_adam .............fused_adam [NO]fused_adam ....................fused_adam [NO]............. .............[OKAY] ....... [NO] [NO]fused_lamb[OKAY] ........................... [OKAY][NO][OKAY]fused_lamb .................... fused_lamb[NO][OKAY]fused_lamb ................................. [OKAY] [NO] [NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] sparse_attn.......sparse_attn transformer .................................... [OKAY] [NO][NO] [NO].............. transformer ....... [OKAY] [OKAY][OKAY]............ transformer[NO] transformer ............ .......stochastic_transformer ............ [NO] [OKAY] [NO]. ....... .......[NO][OKAY] stochastic_transformer ....... [OKAY] [OKAY]. stochastic_transformer [NO] .......stochastic_transformer. [OKAY][NO] ........ [NO][OKAY] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name-------------------------------------------------- ................op nameop nameop name installed ................................ ................ .. installed installedinstalled compatible .. .... -------------------------------------------------- compatiblecompatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adam [OKAY]cpu_adam............... ..............................[YES] [YES][YES]...... ............[OKAY]fused_adam [OKAY][OKAY]............. [NO] ....... [OKAY] fused_adam fused_lamb.............fused_adam fused_adam............. [NO] ............. [NO]............. ....... [NO] [NO].......[OKAY] .......[OKAY]....... [OKAY]fused_lamb[OKAY] ............. [NO]fused_lambfused_lamb .................... [OKAY].............[NO] sparse_attn [NO]................... .......[OKAY][NO] [OKAY]....... [OKAY] transformersparse_attn ........................ [NO][NO]sparse_attn sparse_attn.......................... [OKAY] ............ [NO] [OKAY] [NO] transformer....... ...................[OKAY] stochastic_transformer[NO] [OKAY] .......transformer .transformer [OKAY] ............ ............[NO] [NO][NO] ....... ..............stochastic_transformer[OKAY] [OKAY][OKAY] . [NO] stochastic_transformer.......stochastic_transformer [OKAY]. . [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 ..................... deepspeed install path11.2 ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-25 02:35:45,964] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.305 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 20.734 seconds time to initialize megatron (seconds): -8.955 [after megatron is initialized] datetime: 2021-09-25 02:36:07 building GPT model ... [2021-09-25 02:36:07,098] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-25 02:36:07,100] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-25 02:36:07,101] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.67 GB, percent = 19.6% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-25 02:36:08,503] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 [2021-09-25 02:36:09,735] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-25 02:36:09,736] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-25 02:36:09,737] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.85 GB, percent = 19.7% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-25 02:36:09,793] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-25 02:36:09,889] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-25 02:36:09,889] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-25 02:36:09,890] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-25 02:36:09,890] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-25 02:36:09,890] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-25 02:36:09,890] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-25 02:36:09,890] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-25 02:36:09,890] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-25 02:36:09,890] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-25 02:36:09,890] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-25 02:36:14,495] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-25 02:36:14,495] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-25 02:36:14,495] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-25 02:36:14,495] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-25 02:36:14,495] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-25 02:36:14,495] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-25 02:36:14,495] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-25 02:36:14,495] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-25 02:36:14,495] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-25 02:36:14,495] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] amp_params ................... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] dump_state ................... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] pld_params ................... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-25 02:36:14,497] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-25 02:36:14,497] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 167 successfully loaded 8 ZeRO state_dicts for rank 183 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 56 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 177 successfully loaded 8 ZeRO state_dicts for rank 104 successfully loaded 8 ZeRO state_dicts for rank 164 successfully loaded 8 ZeRO state_dicts for rank 176 successfully loaded 8 ZeRO state_dicts for rank 110 successfully loaded 8 ZeRO state_dicts for rank 58 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 188 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 170 successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 109 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 200 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 15 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 171 successfully loaded 8 ZeRO state_dicts for rank 169 successfully loaded 8 ZeRO state_dicts for rank 20 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 57 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 120 successfully loaded 8 ZeRO state_dicts for rank 211 successfully loaded 8 ZeRO state_dicts for rank 221 successfully loaded 8 ZeRO state_dicts for rank 16 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 21 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 107 successfully loaded 8 ZeRO state_dicts for rank 194 successfully loaded 8 ZeRO state_dicts for rank 142 successfully loaded 8 ZeRO state_dicts for rank 51 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 160 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 97 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 174 successfully loaded 8 ZeRO state_dicts for rank 23 successfully loaded 8 ZeRO state_dicts for rank 121 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 75 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 205 loading 8 zero partition checkpoints for rank 180 successfully loaded 8 ZeRO state_dicts for rank 190 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 196 loading 8 zero partition checkpoints for rank 206 successfully loaded 8 ZeRO state_dicts for rank 165 loading 8 zero partition checkpoints for rank 108 successfully loaded 8 ZeRO state_dicts for rank 179 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 13 successfully loaded 8 ZeRO state_dicts for rank 36 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 02:36:41 CEST)" was missed by 0:00:03.258297 successfully loaded 8 ZeRO state_dicts for rank 199 successfully loaded 8 ZeRO state_dicts for rank 55 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 162 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 22 successfully loaded 8 ZeRO state_dicts for rank 210 loading 8 zero partition checkpoints for rank 183 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 130 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 02:36:42 CEST)" was missed by 0:00:03.400033 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 208 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 181 successfully loaded 8 ZeRO state_dicts for rank 92 loading 8 zero partition checkpoints for rank 167 successfully loaded 8 ZeRO state_dicts for rank 12 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 118 successfully loaded 8 ZeRO state_dicts for rank 19 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 236 successfully loaded 8 ZeRO state_dicts for rank 224 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 65 successfully loaded 8 ZeRO state_dicts for rank 41 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 71 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 138 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 134 successfully loaded 8 ZeRO state_dicts for rank 14 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 88 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 114 successfully loaded 8 ZeRO state_dicts for rank 237 successfully loaded 8 ZeRO state_dicts for rank 45 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 106 loading 8 zero partition checkpoints for rank 56 successfully loaded 8 ZeRO state_dicts for rank 218 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 25 successfully loaded 8 ZeRO state_dicts for rank 98 successfully loaded 8 ZeRO state_dicts for rank 245 loading 8 zero partition checkpoints for rank 112 successfully loaded 8 ZeRO state_dicts for rank 240 successfully loaded 8 ZeRO state_dicts for rank 213 loading 8 zero partition checkpoints for rank 60 successfully loaded 8 ZeRO state_dicts for rank 103 successfully loaded 8 ZeRO state_dicts for rank 0 successfully loaded 8 ZeRO state_dicts for rank 191 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 252 successfully loaded 8 ZeRO state_dicts for rank 67 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 54 successfully loaded 8 ZeRO state_dicts for rank 119 successfully loaded 8 ZeRO state_dicts for rank 77 successfully loaded 8 ZeRO state_dicts for rank 73 loading 8 zero partition checkpoints for rank 168 successfully loaded 8 ZeRO state_dicts for rank 238 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 94 successfully loaded 8 ZeRO state_dicts for rank 147 successfully loaded 8 ZeRO state_dicts for rank 3 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 233 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 17 successfully loaded 8 ZeRO state_dicts for rank 117 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 133 successfully loaded 8 ZeRO state_dicts for rank 11 successfully loaded 8 ZeRO state_dicts for rank 101 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 46 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 78 successfully loaded 8 ZeRO state_dicts for rank 242 loading 8 zero partition checkpoints for rank 52 successfully loaded 8 ZeRO state_dicts for rank 239 successfully loaded 8 ZeRO state_dicts for rank 9 successfully loaded 8 ZeRO state_dicts for rank 74 successfully loaded 8 ZeRO state_dicts for rank 225 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 146 loading 8 zero partition checkpoints for rank 104 successfully loaded 8 ZeRO state_dicts for rank 122 successfully loaded 8 ZeRO state_dicts for rank 2 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 53 loading 8 zero partition checkpoints for rank 176 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 28 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 234 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 226 loading 8 zero partition checkpoints for rank 110 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 228 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 152 loading 8 zero partition checkpoints for rank 184 successfully loaded 8 ZeRO state_dicts for rank 227 successfully loaded 8 ZeRO state_dicts for rank 85 successfully loaded 8 ZeRO state_dicts for rank 154 loading 8 zero partition checkpoints for rank 127 successfully loaded 8 ZeRO state_dicts for rank 24 successfully loaded 8 ZeRO state_dicts for rank 241 loading 8 zero partition checkpoints for rank 96 successfully loaded 8 ZeRO state_dicts for rank 1 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 42 successfully loaded 8 ZeRO state_dicts for rank 232 loading 8 zero partition checkpoints for rank 116 loading 8 zero partition checkpoints for rank 172 successfully loaded 8 ZeRO state_dicts for rank 33 successfully loaded 8 ZeRO state_dicts for rank 10 successfully loaded 8 ZeRO state_dicts for rank 31 successfully loaded 8 ZeRO state_dicts for rank 38 loading 8 zero partition checkpoints for rank 63 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 249 successfully loaded 8 ZeRO state_dicts for rank 246 loading 8 zero partition checkpoints for rank 58 successfully loaded 8 ZeRO state_dicts for rank 151 loading 8 zero partition checkpoints for rank 204 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 34 successfully loaded 8 ZeRO state_dicts for rank 250 successfully loaded 8 ZeRO state_dicts for rank 102 successfully loaded 8 ZeRO state_dicts for rank 230 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 26 successfully loaded 8 ZeRO state_dicts for rank 29 loading 8 zero partition checkpoints for rank 62 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 124 loading 8 zero partition checkpoints for rank 109 successfully loaded 8 ZeRO state_dicts for rank 247 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 30 successfully loaded 8 ZeRO state_dicts for rank 153 loading 8 zero partition checkpoints for rank 113 successfully loaded 8 ZeRO state_dicts for rank 251 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 235 loading 8 zero partition checkpoints for rank 177 loading 8 zero partition checkpoints for rank 200 loading 8 zero partition checkpoints for rank 214 successfully loaded 8 ZeRO state_dicts for rank 254 successfully loaded 8 ZeRO state_dicts for rank 229 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 211 loading 8 zero partition checkpoints for rank 111 loading 8 zero partition checkpoints for rank 221 loading 8 zero partition checkpoints for rank 143 successfully loaded 8 ZeRO state_dicts for rank 66 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 194 loading 8 zero partition checkpoints for rank 81 loading 8 zero partition checkpoints for rank 15 successfully loaded 8 ZeRO state_dicts for rank 231 loading 8 zero partition checkpoints for rank 207 loading 8 zero partition checkpoints for rank 107 loading 8 zero partition checkpoints for rank 160 successfully loaded 8 ZeRO state_dicts for rank 253 loading 8 zero partition checkpoints for rank 105 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 223 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 174 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 61 loading 8 zero partition checkpoints for rank 120 loading 8 zero partition checkpoints for rank 135 loading 8 zero partition checkpoints for rank 97 loading 8 zero partition checkpoints for rank 140 loading 8 zero partition checkpoints for rank 16 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 100 loading 8 zero partition checkpoints for rank 171 loading 8 zero partition checkpoints for rank 205 loading 8 zero partition checkpoints for rank 76 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 255 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 175 loading 8 zero partition checkpoints for rank 99 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 199 loading 8 zero partition checkpoints for rank 166 loading 8 zero partition checkpoints for rank 158 loading 8 zero partition checkpoints for rank 157 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 215 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 115 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 134 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 87 loading 8 zero partition checkpoints for rank 201 loading 8 zero partition checkpoints for rank 197 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 173 loading 8 zero partition checkpoints for rank 132 loading 8 zero partition checkpoints for rank 195 loading 8 zero partition checkpoints for rank 178 loading 8 zero partition checkpoints for rank 69 loading 8 zero partition checkpoints for rank 65 loading 8 zero partition checkpoints for rank 125 loading 8 zero partition checkpoints for rank 138 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 196 loading 8 zero partition checkpoints for rank 130 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 165 loading 8 zero partition checkpoints for rank 209 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 114 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 142 loading 8 zero partition checkpoints for rank 136 loading 8 zero partition checkpoints for rank 19 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 159 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 88 loading 8 zero partition checkpoints for rank 67 loading 8 zero partition checkpoints for rank 106 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 73 loading 8 zero partition checkpoints for rank 218 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 139 loading 8 zero partition checkpoints for rank 137 loading 8 zero partition checkpoints for rank 212 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 144 loading 8 zero partition checkpoints for rank 57 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 23 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 220 loading 8 zero partition checkpoints for rank 40 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 83 loading 8 zero partition checkpoints for rank 150 loading 8 zero partition checkpoints for rank 156 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 141 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 123 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 243 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 68 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 192 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 80 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 93 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 101 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 33 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 239 loading 8 zero partition checkpoints for rank 48 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 64 loading 8 zero partition checkpoints for rank 170 loading 8 zero partition checkpoints for rank 11 loading 8 zero partition checkpoints for rank 145 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 152 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 155 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 226 loading 8 zero partition checkpoints for rank 244 loading 8 zero partition checkpoints for rank 75 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 193 loading 8 zero partition checkpoints for rank 227 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 169 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 18 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 128 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 246 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 72 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 147 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 50 loading 8 zero partition checkpoints for rank 216 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 148 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 32 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 8 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 229 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 235 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 250 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 251 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 238 loading 8 zero partition checkpoints for rank 255 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 154 loading 8 zero partition checkpoints for rank 219 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 240 loading 8 zero partition checkpoints for rank 10 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 153 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 248 loading 8 zero partition checkpoints for rank 242 loading 8 zero partition checkpoints for rank 252 loading 8 zero partition checkpoints for rank 232 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 249 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 30 successfully loaded 8 ZeRO state_dicts for rank 5 loading 8 zero partition checkpoints for rank 5 successfully loaded 8 ZeRO state_dicts for rank 6 successfully loaded 8 ZeRO state_dicts for rank 4 successfully loaded 8 ZeRO state_dicts for rank 7 loading 8 zero partition checkpoints for rank 6 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 02:38:42 CEST)" was missed by 0:00:03.040753 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 7 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 5827 time (ms) | load-checkpoint: 94708.03 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-25 02:37:49 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.199121 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.460 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.335 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.163 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-25 02:37:56 done with setup ... training ... time (ms) | model-and-optimizer-setup: 102787.57 | train/valid/test-data-iterators-setup: 6275.52 [before the start of training step] datetime: 2021-09-25 02:37:56 [2021-09-25 02:37:56,930] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-25 02:37:56,931] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-25 02:37:56,931] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-25 02:37:56,931] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-25 02:37:56,931] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 1] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 22862.0 | max reserved: 22862.0 [Rank 225] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.68701171875 | reserved: 22492.0 | max reserved: 22492.0 [Rank 2] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 22862.0 | max reserved: 22862.0 [Rank 226] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.6865234375 | reserved: 20752.0 | max reserved: 20752.0 [Rank 224] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.6875 | reserved: 22492.0 | max reserved: 22492.0 [Rank 0] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 23246.0 | max reserved: 23246.0 [Rank 3] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 22862.0 | max reserved: 22862.0 [Rank 227] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.68701171875 | reserved: 22492.0 | max reserved: 22492.0 iteration 5830/ 159576 | consumed samples: 168368 | elapsed time per iteration (ms): 21875.4 | learning rate: 4.656E-05 | global batch size: 64 | lm loss: 6.454423E+00 | loss scale: 2048.0 | grad norm: 45630.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [Rank 65] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19902.0 | max reserved: 19902.0 [Rank 33] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 19866.0 | max reserved: 19866.0 [Rank 97] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19402.0 | max reserved: 19402.0 [Rank 66] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19890.0 | max reserved: 19890.0 [Rank 34] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20370.0 | max reserved: 20370.0 [Rank 193] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19066.0 | max reserved: 19066.0 [Rank 161] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19146.0 | max reserved: 19146.0 [Rank 129] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19582.0 | max reserved: 19582.0 [Rank 162] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19066.0 | max reserved: 19066.0 [Rank 130] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19434.0 | max reserved: 19434.0 [Rank 98] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19674.0 | max reserved: 19674.0 [Rank 194] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19066.0 | max reserved: 19066.0 [Rank 64] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20536.0 | max reserved: 20536.0 [Rank 32] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20408.0 | max reserved: 20408.0 [Rank 99] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19838.0 | max reserved: 19838.0 [Rank 131] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19502.0 | max reserved: 19502.0 [Rank 67] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19902.0 | max reserved: 19902.0 [Rank 35] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 19866.0 | max reserved: 19866.0 [Rank 192] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19012.0 | max reserved: 19012.0 [Rank 128] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19908.0 | max reserved: 19908.0 [Rank 160] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19636.0 | max reserved: 19636.0 [Rank 96] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19988.0 | max reserved: 19988.0 [Rank 163] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19306.0 | max reserved: 19306.0 [Rank 195] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0 iteration 5840/ 159576 | consumed samples: 169008 | elapsed time per iteration (ms): 16822.3 | learning rate: 4.674E-05 | global batch size: 64 | lm loss: 6.392004E+00 | loss scale: 2048.0 | grad norm: 53106.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5850/ 159576 | consumed samples: 169648 | elapsed time per iteration (ms): 16813.6 | learning rate: 4.692E-05 | global batch size: 64 | lm loss: 6.347363E+00 | loss scale: 2048.0 | grad norm: 53512.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5860/ 159576 | consumed samples: 170288 | elapsed time per iteration (ms): 16773.5 | learning rate: 4.709E-05 | global batch size: 64 | lm loss: 6.368040E+00 | loss scale: 2048.0 | grad norm: 49687.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5870/ 159576 | consumed samples: 170928 | elapsed time per iteration (ms): 16844.9 | learning rate: 4.727E-05 | global batch size: 64 | lm loss: 6.372821E+00 | loss scale: 2048.0 | grad norm: 49107.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5880/ 159576 | consumed samples: 171568 | elapsed time per iteration (ms): 16812.2 | learning rate: 4.745E-05 | global batch size: 64 | lm loss: 6.379050E+00 | loss scale: 2048.0 | grad norm: 76898.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5890/ 159576 | consumed samples: 172208 | elapsed time per iteration (ms): 16819.7 | learning rate: 4.763E-05 | global batch size: 64 | lm loss: 6.333071E+00 | loss scale: 2048.0 | grad norm: 69874.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5900/ 159576 | consumed samples: 172848 | elapsed time per iteration (ms): 16821.3 | learning rate: 4.780E-05 | global batch size: 64 | lm loss: 6.354385E+00 | loss scale: 2048.0 | grad norm: 57915.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5910/ 159576 | consumed samples: 173488 | elapsed time per iteration (ms): 16679.9 | learning rate: 4.798E-05 | global batch size: 64 | lm loss: 6.361916E+00 | loss scale: 2048.0 | grad norm: 56535.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5920/ 159576 | consumed samples: 174128 | elapsed time per iteration (ms): 16731.8 | learning rate: 4.816E-05 | global batch size: 64 | lm loss: 6.371978E+00 | loss scale: 2048.0 | grad norm: 75613.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5930/ 159576 | consumed samples: 174768 | elapsed time per iteration (ms): 16796.3 | learning rate: 4.834E-05 | global batch size: 64 | lm loss: 6.373956E+00 | loss scale: 2048.0 | grad norm: 64436.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 03:08:32] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1185639_[1-10%1] on 'gpu_p13' partition) [2021-09-25 03:08:32] PULSE: tr8-104B is running for 33:04 since 2021-09-25T02:35:28 (1185609 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n[0,7-8]) iteration 5940/ 159576 | consumed samples: 175408 | elapsed time per iteration (ms): 16680.4 | learning rate: 4.851E-05 | global batch size: 64 | lm loss: 6.367229E+00 | loss scale: 2048.0 | grad norm: 61103.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5950/ 159576 | consumed samples: 176048 | elapsed time per iteration (ms): 16548.2 | learning rate: 4.869E-05 | global batch size: 64 | lm loss: 6.365273E+00 | loss scale: 2048.0 | grad norm: 74137.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5960/ 159576 | consumed samples: 176688 | elapsed time per iteration (ms): 16720.7 | learning rate: 4.887E-05 | global batch size: 64 | lm loss: 6.339179E+00 | loss scale: 2048.0 | grad norm: 117906.851 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5970/ 159576 | consumed samples: 177328 | elapsed time per iteration (ms): 16666.6 | learning rate: 4.905E-05 | global batch size: 64 | lm loss: 6.366007E+00 | loss scale: 2048.0 | grad norm: 135736.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5980/ 159576 | consumed samples: 177968 | elapsed time per iteration (ms): 16712.0 | learning rate: 4.922E-05 | global batch size: 64 | lm loss: 6.311417E+00 | loss scale: 2048.0 | grad norm: 59672.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5990/ 159576 | consumed samples: 178608 | elapsed time per iteration (ms): 16795.9 | learning rate: 4.940E-05 | global batch size: 64 | lm loss: 6.346366E+00 | loss scale: 2048.0 | grad norm: 70394.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 03:26:24,359] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=13, lr=[4.9579881656804734e-05, 4.9579881656804734e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 6000 loss: 6.4051 iter time (s): 0.008 samples/sec: 7888.018 iteration 6000/ 159576 | consumed samples: 179248 | elapsed time per iteration (ms): 16825.1 | learning rate: 4.958E-05 | global batch size: 64 | lm loss: 6.338142E+00 | loss scale: 2048.0 | grad norm: 51469.855 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 6000 | lm loss value: 6.305492E+00 | lm loss PPL: 5.475711E+02 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-25 03:26:46,630] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step6000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 18535.85 iteration 6010/ 159576 | consumed samples: 179888 | elapsed time per iteration (ms): 19605.0 | learning rate: 4.976E-05 | global batch size: 64 | lm loss: 6.332598E+00 | loss scale: 2048.0 | grad norm: 64216.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6020/ 159576 | consumed samples: 180528 | elapsed time per iteration (ms): 16682.2 | learning rate: 4.993E-05 | global batch size: 64 | lm loss: 6.346989E+00 | loss scale: 2048.0 | grad norm: 65052.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6030/ 159576 | consumed samples: 181168 | elapsed time per iteration (ms): 16536.1 | learning rate: 5.011E-05 | global batch size: 64 | lm loss: 6.314711E+00 | loss scale: 2048.0 | grad norm: 61186.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6040/ 159576 | consumed samples: 181808 | elapsed time per iteration (ms): 16509.4 | learning rate: 5.029E-05 | global batch size: 64 | lm loss: 6.347876E+00 | loss scale: 2048.0 | grad norm: 80684.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6050/ 159576 | consumed samples: 182448 | elapsed time per iteration (ms): 16821.6 | learning rate: 5.047E-05 | global batch size: 64 | lm loss: 6.345741E+00 | loss scale: 2048.0 | grad norm: 207970.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6060/ 159576 | consumed samples: 183088 | elapsed time per iteration (ms): 16815.3 | learning rate: 5.064E-05 | global batch size: 64 | lm loss: 6.341463E+00 | loss scale: 2048.0 | grad norm: 57913.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6070/ 159576 | consumed samples: 183728 | elapsed time per iteration (ms): 16825.8 | learning rate: 5.082E-05 | global batch size: 64 | lm loss: 6.336625E+00 | loss scale: 2048.0 | grad norm: 62496.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6080/ 159576 | consumed samples: 184368 | elapsed time per iteration (ms): 16749.3 | learning rate: 5.100E-05 | global batch size: 64 | lm loss: 6.378619E+00 | loss scale: 2048.0 | grad norm: 53421.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6090/ 159576 | consumed samples: 185008 | elapsed time per iteration (ms): 16844.2 | learning rate: 5.118E-05 | global batch size: 64 | lm loss: 6.363810E+00 | loss scale: 2048.0 | grad norm: 53621.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6100/ 159576 | consumed samples: 185648 | elapsed time per iteration (ms): 16803.1 | learning rate: 5.136E-05 | global batch size: 64 | lm loss: 6.397610E+00 | loss scale: 2048.0 | grad norm: 63234.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6110/ 159576 | consumed samples: 186288 | elapsed time per iteration (ms): 16808.5 | learning rate: 5.153E-05 | global batch size: 64 | lm loss: 6.359557E+00 | loss scale: 2048.0 | grad norm: 52582.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6120/ 159576 | consumed samples: 186928 | elapsed time per iteration (ms): 16792.9 | learning rate: 5.171E-05 | global batch size: 64 | lm loss: 6.347573E+00 | loss scale: 2048.0 | grad norm: 50959.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6130/ 159576 | consumed samples: 187568 | elapsed time per iteration (ms): 16806.7 | learning rate: 5.189E-05 | global batch size: 64 | lm loss: 6.351057E+00 | loss scale: 2048.0 | grad norm: 152670.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6140/ 159576 | consumed samples: 188208 | elapsed time per iteration (ms): 16808.0 | learning rate: 5.207E-05 | global batch size: 64 | lm loss: 6.374673E+00 | loss scale: 2048.0 | grad norm: 50742.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 04:08:28] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1185639_[1-10%1] on 'gpu_p13' partition) [2021-09-25 04:08:28] PULSE: tr8-104B is running for 1:33:00 since 2021-09-25T02:35:28 (1185609 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n[0,7-8]) iteration 6150/ 159576 | consumed samples: 188848 | elapsed time per iteration (ms): 16696.6 | learning rate: 5.224E-05 | global batch size: 64 | lm loss: 6.323299E+00 | loss scale: 2048.0 | grad norm: 55101.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6160/ 159576 | consumed samples: 189600 | elapsed time per iteration (ms): 17385.3 | learning rate: 5.245E-05 | global batch size: 80 | lm loss: 6.368839E+00 | loss scale: 2048.0 | grad norm: 51296.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6170/ 159576 | consumed samples: 190400 | elapsed time per iteration (ms): 17823.6 | learning rate: 5.267E-05 | global batch size: 80 | lm loss: 6.355129E+00 | loss scale: 2048.0 | grad norm: 85490.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6180/ 159576 | consumed samples: 191200 | elapsed time per iteration (ms): 17757.4 | learning rate: 5.289E-05 | global batch size: 80 | lm loss: 6.373211E+00 | loss scale: 2048.0 | grad norm: 112584.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6190/ 159576 | consumed samples: 192000 | elapsed time per iteration (ms): 17583.1 | learning rate: 5.312E-05 | global batch size: 80 | lm loss: 6.372861E+00 | loss scale: 2048.0 | grad norm: 102723.952 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6200/ 159576 | consumed samples: 192800 | elapsed time per iteration (ms): 17380.3 | learning rate: 5.334E-05 | global batch size: 80 | lm loss: 6.336594E+00 | loss scale: 2048.0 | grad norm: 41950.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6210/ 159576 | consumed samples: 193600 | elapsed time per iteration (ms): 17443.3 | learning rate: 5.356E-05 | global batch size: 80 | lm loss: 6.355303E+00 | loss scale: 4096.0 | grad norm: 131493.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 6210 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-25 04:26:31,836] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step6210/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 6210 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 19730.09 [exiting program after 110.16574249267578 minutes] datetime: 2021-09-25 04:26:46 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed-------------------------------------------------- .. compatibleDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] --------------------------------------------------fused_adam ............. DeepSpeed C++/CUDA extension op report[NO] .......-------------------------------------------------- [OKAY]NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjafused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adamninja ............. ..................[NO] .......[OKAY] [OKAY] -------------------------------------------------- op name fused_lamb................ .............installed [NO].. .......compatible [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] sparse_attn fused_adam ............. [NO] ................... [NO][OKAY] ....... [OKAY]fused_lamb ............. [NO] .......transformer [OKAY]............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO]sparse_attn ................... [OKAY][NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- transformer ............ [NO] ....... [OKAY] JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . --------------------------------------------------[NO] ....... -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] DeepSpeed general environment info: -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] cpu_adam ............... [YES] ...... [OKAY] torch version .................... 1.8.1 fused_adam ............. [NO] ....... [OKAY] torch cuda version ............... 11.1 fused_lamb ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 sparse_attn ............ [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer ............ [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science stochastic_transformer . [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed C++/CUDA extension op report -------------------------------------------------- utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ninja............... [YES].................. [OKAY]...... [OKAY]-------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible fused_adam --------------------------------------------------............. fused_adam ............. [NO] ....... [OKAY] [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb .............cpu_adam [NO]............... .......[YES] ......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adamsparse_attn ......................... [NO] [NO]....... .......[OKAY] [OKAY] fused_lamb transformer............. ............[NO] .......[NO] [OKAY]....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- transformer_inferencetransformer_inference .... [NO] ....... [OKAY] [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY] JIT compiled ops requires ninja ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version ..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam-------------------------------------------------- ............... [YES] DeepSpeed C++/CUDA extension op report...... [OKAY]-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lambninja ............. ..................[NO] [OKAY]....... [OKAY]-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. op name ................ installed .. compatible -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- sparse_attn ............ [NO] .......cpu_adam [OKAY]............... [YES] ......transformer [OKAY]............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- stochastic_transformerfused_adam .............. [NO][NO] ....... .......[OKAY] JIT compiled ops requires ninja [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system transformer ............ [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info:torch install path sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... stochastic_transformer . [NO] ....... [OKAY] 1.8.1nvcc version ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 ninja .................. [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 nvcc version torch version..................... ....................11.2 1.8.1deepspeed install path ........... torch cuda version ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.1deepspeed info nvcc version................... .....................0.4.2+bc17042, bc17042, big-science 11.2 deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op reportstochastic_transformer -------------------------------------------------- . NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.[NO] --------------------------------------------------....... JIT compiled ops requires ninja[OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- async_io ............... [NO] ....... [NO] ninja .................. [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ninja .................. [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ...... utils[OKAY] .................. [YES]quantizer .................... [NO] ....... [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] ....... .......[NO] [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------sparse_attn ............ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja [NO]DeepSpeed C++/CUDA extension op report ....... --------------------------------------------------[OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. transformer --------------------------------------------------............ JIT compiled ops requires ninja[NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................DeepSpeed general environment info: 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... .................... 1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... DeepSpeed general environment info:11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch install path ................... ...............0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed general environment info: DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO]async_io ....... [NO]............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY]utils -------------------------------------------------- .................. [YES] ...... [OKAY]utils .................. [YES]quantizer .................... [OKAY][NO] ....... quantizer[OKAY] .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.-------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] utils .................. [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] torch version .................... 1.8.1 transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] torch cuda version ............... 11.1 utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] nvcc version ..................... 11.2 quantizer ..............quantizer [NO].............. ....... [NO][OKAY] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [YES] ...... [OKAY] [NO] ....... [OKAY]quantizer .............. [NO] ....... [OKAY] utils .................. [YES]-------------------------------------------------- ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] DeepSpeed general environment info: fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 sparse_attn ............ [NO] ....... [OKAY] torch cuda version ............... 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- transformer ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- async_io ............... [NO] ....... [NO] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] torch cuda versionnvcc version .................................... 11.111.2 cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path deepspeed info........... ................... 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch cuda version ............... 11.1 JIT compiled ops requires ninja nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: ninja .................. [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 -------------------------------------------------- torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] op name ................ installed .. compatible -------------------------------------------------- deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- utils .................. [YES] ...... [OKAY] JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 op name ................ installed .. compatible -------------------------------------------------- nvcc version ..................... 11.2 cpu_adam ............... [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed C++/CUDA extension op report deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report transformer ............ [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] torch version .................... 1.8.1 fused_lamb ............. [NO] ....... [OKAY] torch cuda version ............... 11.1 transformer_inference .. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_ioasync_io .............................. [NO][NO] .............. [NO][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] torch cuda version ............... 11.1 transformer_inference .. [NO] ....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- op name ................ installed .. compatible transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ninja .................. [OKAY] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 transformer ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 stochastic_transformer . [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 transformer_inference .. [NO] ....... [OKAY] nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed general environment info: -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op name ................ installed .. compatible -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 JIT compiled ops requires ninjaninja .................. [OKAY] -------------------------------------------------- torch cuda version ............... 11.1 op name ................ installed .. compatible -------------------------------------------------- nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]ninja .................. [OKAY] -------------------------------------------------- op name ................ installedfused_adam ............... compatible[NO] .......-------------------------------------------------- [OKAY] fused_lamb ............. [NO]cpu_adam ...................... [OKAY][YES] ...... [OKAY] sparse_attn fused_adam............ .............[NO] [NO]....... .......[OKAY] [OKAY] transformer ............fused_lamb [NO]............. .......[NO] [OKAY]....... ninja .................. [OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: ninja .................. [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op name ................ installed .. compatible -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch cuda version ............... 11.1 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] nvcc versionDeepSpeed general environment info: ..................... 11.2 -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info .................................. 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 ninja .................. [OKAY] nvcc version ..................... 11.2 -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found cpu_adam ............... [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja ninja.................. [OKAY].................. --------------------------------------------------[OKAY] op name --------------------------------------------------................ op nameinstalled .................. compatibleinstalled -------------------------------------------------- .. compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... [OKAY]............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam .............fused_lamb .............[NO] [NO]....... ....... [OKAY][OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformersparse_attn ........................ [NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformertransformer ............. [NO][NO] ....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info:DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version torch cuda version............... ...............11.1 11.1 nvcc versionnvcc version .......................................... 11.211.2 quantizer .............. [NO] ....... [OKAY] deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path torch version............... .................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 11.20.4.2+bc17042, bc17042, big-science deepspeed install path deepspeed wheel compiled w............ ...... torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY]quantizer .............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer_inference [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] async_ioquantizer ............................. [NO][NO] .............. [NO][OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info:deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................torch install path 0.4.2+bc17042, bc17042, big-science ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version .................... ...............1.8.1 torch cuda version ............... 11.1 nvcc version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']..................... 11.2 JIT compiled ops requires ninja deepspeed install path torch version........... ....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 1.8.1deepspeed info ................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version deepspeed wheel compiled w................ ...... 11.1torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2 deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] torch install path ............... DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch install pathtorch version ................................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.1 transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found torch versionnvcc version ......................................... 1.8.111.2 stochastic_transformer . [NO] ....... [OKAY] deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO] ...................... [NO][NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. /bin/sh: line 0: type: git: not found -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. [YES] utils...... [OKAY] .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... .....................11.1 DeepSpeed general environment info: 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja --------------------------------------------------.................. [OKAY] DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------op name NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op................. --------------------------------------------------installed JIT compiled ops requires ninja.. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info:DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc version torch install path.....................torch install path 11.2............... deepspeed install path............... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']................... torch version 0.4.2+bc17042, bc17042, big-sciencetorch version.................... deepspeed wheel compiled w.....................1.8.1 ......1.8.1 torch cuda versiontorch 1.8, cuda 11.1 torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch version ....................torch cuda version ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ninja .................. [OKAY] torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 op name ................ installed .. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] cpu_adam ............... [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_adam ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 DeepSpeed general environment info: 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 async_io ............... [NO] ....... [NO] nvcc versionnvcc version .......................................... 11.211.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science utils .................. [YES] ...... [OKAY] 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io-------------------------------------------------- ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO]ninja ....... ..................[OKAY] [OKAY] transformer ............ [NO] --------------------------------------------------....... [OKAY]op name ................ installedstochastic_transformer .. .compatible [NO]-------------------------------------------------- ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_ioutils ................................. [NO][YES] ............. [NO][OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference-------------------------------------------------- .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY][YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... ....... [OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.DeepSpeed general environment info: ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1186600.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 10 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > setting tensorboard ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................DeepSpeed general environment info: 1.8.1 torch cuda version ............... 11.1 torch install pathnvcc version .................................... 11.2 deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch version ....................................... 0.4.2+bc17042, bc17042, big-science1.8.1 deepspeed wheel compiled w. ......torch cuda version torch 1.8, cuda 11.1............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting codecarbon ... > initializing torch distributed ... -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name................ ................ ................ installed................ installed installedinstalled.... ....compatiblecompatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adamcpu_adam ............................................................ [YES][YES][YES][YES] .................. ...... [OKAY][OKAY][OKAY] [OKAY] fused_adam fused_adam.............fused_adam fused_adam [NO].......................... ............. [NO] [NO][NO]....... ..............[OKAY]....... [OKAY][OKAY] [OKAY]fused_lamb .............fused_lamb fused_lambfused_lamb[NO]............. .................................[NO] [NO][OKAY][NO]....... ....... [OKAY] ....... [OKAY] [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............ sparse_attn sparse_attn [NO]transformer ............ ............ ............ [NO]....... [NO][NO] .......[OKAY].............. [OKAY][OKAY][OKAY] transformer ............transformertransformer stochastic_transformer[NO]........................ ....... . [NO][NO] [OKAY][NO]....... ..............[OKAY] [OKAY][OKAY]stochastic_transformer .stochastic_transformer stochastic_transformer[NO] ........ . [NO] [OKAY] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop nameop name op name................................................ ................installedinstalledinstalled installed...... compatible..compatiblecompatible --------------------------------------------------compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam ............... ............... ...............[YES]cpu_adam [YES] [YES] ...... ............... ............ [OKAY] [YES] [OKAY][OKAY] ...... [OKAY] fused_adam .............fused_adamfused_adam fused_adam[NO]............. ............. .................... [NO] [NO][NO] [OKAY] ....... .............. [OKAY] fused_lamb[OKAY][OKAY] ............. [NO]fused_lambfused_lambfused_lamb .............................................. [OKAY][NO][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn ............ [NO] ....... sparse_attnsparse_attnsparse_attn[OKAY] ........................ ............[NO][NO] [NO]transformer.............. ...................[OKAY] [OKAY][OKAY] [NO] transformertransformer.......transformer ............ ............ [OKAY] ............ [NO][NO] [NO].............. stochastic_transformer....... [OKAY] [OKAY] [OKAY] . [NO]stochastic_transformerstochastic_transformer stochastic_transformer ........ .[OKAY].[NO] [NO][NO] .............. .......[OKAY][OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] async_io...... ...............[OKAY] [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-25 04:27:14,118] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.302 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 23.417 seconds time to initialize megatron (seconds): 25.843 [after megatron is initialized] datetime: 2021-09-25 04:27:37 building GPT model ... [2021-09-25 04:27:37,906] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-25 04:27:37,908] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-25 04:27:37,908] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.69 GB, percent = 19.6% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-25 04:27:39,312] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 [2021-09-25 04:27:40,518] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-25 04:27:40,519] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-25 04:27:40,519] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.87 GB, percent = 19.7% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-25 04:27:40,540] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-25 04:27:40,690] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-25 04:27:40,690] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-25 04:27:40,690] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-25 04:27:40,690] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-25 04:27:40,690] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-25 04:27:40,690] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-25 04:27:40,690] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-25 04:27:40,690] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-25 04:27:40,691] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-25 04:27:40,691] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-25 04:27:45,267] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-25 04:27:45,267] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-25 04:27:45,267] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-25 04:27:45,267] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-25 04:27:45,267] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-25 04:27:45,267] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] amp_params ................... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] dump_state ................... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] pld_params ................... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-25 04:27:45,270] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-25 04:27:45,270] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-25 04:27:45,270] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-25 04:27:45,270] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-25 04:27:45,270] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 142 successfully loaded 8 ZeRO state_dicts for rank 221 successfully loaded 8 ZeRO state_dicts for rank 75 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 33 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 58 successfully loaded 8 ZeRO state_dicts for rank 104 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 213 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 73 successfully loaded 8 ZeRO state_dicts for rank 51 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 146 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 164 successfully loaded 8 ZeRO state_dicts for rank 34 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 45 successfully loaded 8 ZeRO state_dicts for rank 211 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 154 successfully loaded 8 ZeRO state_dicts for rank 88 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 53 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 165 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 210 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 38 successfully loaded 8 ZeRO state_dicts for rank 160 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 36 successfully loaded 8 ZeRO state_dicts for rank 57 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 67 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 147 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 120 successfully loaded 8 ZeRO state_dicts for rank 56 loading 8 zero partition checkpoints for rank 223 successfully loaded 8 ZeRO state_dicts for rank 54 successfully loaded 8 ZeRO state_dicts for rank 130 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 218 successfully loaded 8 ZeRO state_dicts for rank 65 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 85 loading 8 zero partition checkpoints for rank 156 successfully loaded 8 ZeRO state_dicts for rank 109 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 152 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 103 successfully loaded 8 ZeRO state_dicts for rank 66 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 74 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 151 successfully loaded 8 ZeRO state_dicts for rank 171 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 14 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 196 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 181 successfully loaded 8 ZeRO state_dicts for rank 55 successfully loaded 8 ZeRO state_dicts for rank 228 loading 8 zero partition checkpoints for rank 48 successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 170 successfully loaded 8 ZeRO state_dicts for rank 208 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 134 successfully loaded 8 ZeRO state_dicts for rank 153 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 133 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 194 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 114 successfully loaded 8 ZeRO state_dicts for rank 200 loading 8 zero partition checkpoints for rank 220 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 138 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 176 successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 98 successfully loaded 8 ZeRO state_dicts for rank 101 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 107 successfully loaded 8 ZeRO state_dicts for rank 42 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 94 loading 8 zero partition checkpoints for rank 142 successfully loaded 8 ZeRO state_dicts for rank 77 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 199 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 205 successfully loaded 8 ZeRO state_dicts for rank 167 successfully loaded 8 ZeRO state_dicts for rank 41 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 119 successfully loaded 8 ZeRO state_dicts for rank 106 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 92 successfully loaded 8 ZeRO state_dicts for rank 236 successfully loaded 8 ZeRO state_dicts for rank 97 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 78 successfully loaded 8 ZeRO state_dicts for rank 10 successfully loaded 8 ZeRO state_dicts for rank 71 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 102 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 26 successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 117 loading 8 zero partition checkpoints for rank 75 successfully loaded 8 ZeRO state_dicts for rank 121 successfully loaded 8 ZeRO state_dicts for rank 174 successfully loaded 8 ZeRO state_dicts for rank 24 loading 8 zero partition checkpoints for rank 50 successfully loaded 8 ZeRO state_dicts for rank 179 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 46 successfully loaded 8 ZeRO state_dicts for rank 12 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 169 loading 8 zero partition checkpoints for rank 216 loading 8 zero partition checkpoints for rank 215 successfully loaded 8 ZeRO state_dicts for rank 11 successfully loaded 8 ZeRO state_dicts for rank 183 successfully loaded 8 ZeRO state_dicts for rank 162 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 108 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 252 successfully loaded 8 ZeRO state_dicts for rank 224 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 240 successfully loaded 8 ZeRO state_dicts for rank 190 loading 8 zero partition checkpoints for rank 141 loading 8 zero partition checkpoints for rank 221 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 231 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 122 successfully loaded 8 ZeRO state_dicts for rank 13 loading 8 zero partition checkpoints for rank 157 successfully loaded 8 ZeRO state_dicts for rank 110 successfully loaded 8 ZeRO state_dicts for rank 233 successfully loaded 8 ZeRO state_dicts for rank 118 loading 8 zero partition checkpoints for rank 184 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 30 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 16 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 250 successfully loaded 8 ZeRO state_dicts for rank 2 successfully loaded 8 ZeRO state_dicts for rank 25 successfully loaded 8 ZeRO state_dicts for rank 230 successfully loaded 8 ZeRO state_dicts for rank 235 successfully loaded 8 ZeRO state_dicts for rank 31 successfully loaded 8 ZeRO state_dicts for rank 177 successfully loaded 8 ZeRO state_dicts for rank 28 successfully loaded 8 ZeRO state_dicts for rank 238 loading 8 zero partition checkpoints for rank 60 loading 8 zero partition checkpoints for rank 144 loading 8 zero partition checkpoints for rank 104 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 40 successfully loaded 8 ZeRO state_dicts for rank 239 loading 8 zero partition checkpoints for rank 140 successfully loaded 8 ZeRO state_dicts for rank 191 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 100 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 232 successfully loaded 8 ZeRO state_dicts for rank 22 loading 8 zero partition checkpoints for rank 52 successfully loaded 8 ZeRO state_dicts for rank 188 successfully loaded 8 ZeRO state_dicts for rank 249 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 237 successfully loaded 8 ZeRO state_dicts for rank 253 successfully loaded 8 ZeRO state_dicts for rank 229 successfully loaded 8 ZeRO state_dicts for rank 29 successfully loaded 8 ZeRO state_dicts for rank 226 successfully loaded 8 ZeRO state_dicts for rank 251 loading 8 zero partition checkpoints for rank 212 successfully loaded 8 ZeRO state_dicts for rank 17 successfully loaded 8 ZeRO state_dicts for rank 241 loading 8 zero partition checkpoints for rank 214 successfully loaded 8 ZeRO state_dicts for rank 9 successfully loaded 8 ZeRO state_dicts for rank 255 successfully loaded 8 ZeRO state_dicts for rank 15 successfully loaded 8 ZeRO state_dicts for rank 245 loading 8 zero partition checkpoints for rank 211 successfully loaded 8 ZeRO state_dicts for rank 246 loading 8 zero partition checkpoints for rank 87 successfully loaded 8 ZeRO state_dicts for rank 242 successfully loaded 8 ZeRO state_dicts for rank 227 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 247 loading 8 zero partition checkpoints for rank 143 successfully loaded 8 ZeRO state_dicts for rank 19 loading 8 zero partition checkpoints for rank 116 loading 8 zero partition checkpoints for rank 132 loading 8 zero partition checkpoints for rank 88 successfully loaded 8 ZeRO state_dicts for rank 20 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 128 loading 8 zero partition checkpoints for rank 154 loading 8 zero partition checkpoints for rank 165 loading 8 zero partition checkpoints for rank 62 successfully loaded 8 ZeRO state_dicts for rank 254 loading 8 zero partition checkpoints for rank 93 successfully loaded 8 ZeRO state_dicts for rank 225 loading 8 zero partition checkpoints for rank 81 loading 8 zero partition checkpoints for rank 127 loading 8 zero partition checkpoints for rank 76 loading 8 zero partition checkpoints for rank 99 loading 8 zero partition checkpoints for rank 57 successfully loaded 8 ZeRO state_dicts for rank 0 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 73 successfully loaded 8 ZeRO state_dicts for rank 1 successfully loaded 8 ZeRO state_dicts for rank 234 loading 8 zero partition checkpoints for rank 166 successfully loaded 8 ZeRO state_dicts for rank 3 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 113 loading 8 zero partition checkpoints for rank 147 loading 8 zero partition checkpoints for rank 219 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 72 loading 8 zero partition checkpoints for rank 58 loading 8 zero partition checkpoints for rank 160 loading 8 zero partition checkpoints for rank 56 loading 8 zero partition checkpoints for rank 158 loading 8 zero partition checkpoints for rank 65 loading 8 zero partition checkpoints for rank 130 loading 8 zero partition checkpoints for rank 115 loading 8 zero partition checkpoints for rank 67 successfully loaded 8 ZeRO state_dicts for rank 21 loading 8 zero partition checkpoints for rank 209 loading 8 zero partition checkpoints for rank 109 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 83 loading 8 zero partition checkpoints for rank 171 loading 8 zero partition checkpoints for rank 136 successfully loaded 8 ZeRO state_dicts for rank 23 loading 8 zero partition checkpoints for rank 218 loading 8 zero partition checkpoints for rank 159 loading 8 zero partition checkpoints for rank 196 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 125 loading 8 zero partition checkpoints for rank 111 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 64 loading 8 zero partition checkpoints for rank 134 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 206 loading 8 zero partition checkpoints for rank 120 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 194 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 178 loading 8 zero partition checkpoints for rank 138 loading 8 zero partition checkpoints for rank 170 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 61 loading 8 zero partition checkpoints for rank 101 loading 8 zero partition checkpoints for rank 124 loading 8 zero partition checkpoints for rank 135 loading 8 zero partition checkpoints for rank 148 loading 8 zero partition checkpoints for rank 139 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 152 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 80 loading 8 zero partition checkpoints for rank 106 loading 8 zero partition checkpoints for rank 69 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 97 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 197 loading 8 zero partition checkpoints for rank 112 loading 8 zero partition checkpoints for rank 145 loading 8 zero partition checkpoints for rank 180 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 63 loading 8 zero partition checkpoints for rank 123 loading 8 zero partition checkpoints for rank 137 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 150 loading 8 zero partition checkpoints for rank 68 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 205 loading 8 zero partition checkpoints for rank 8 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 183 loading 8 zero partition checkpoints for rank 168 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 155 loading 8 zero partition checkpoints for rank 176 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 153 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 96 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 201 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 107 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 110 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 105 loading 8 zero partition checkpoints for rank 193 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 248 loading 8 zero partition checkpoints for rank 114 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 200 loading 8 zero partition checkpoints for rank 33 loading 8 zero partition checkpoints for rank 177 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 244 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 30 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 204 loading 8 zero partition checkpoints for rank 16 loading 8 zero partition checkpoints for rank 169 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 199 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 240 loading 8 zero partition checkpoints for rank 167 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 10 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 192 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 252 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 207 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 32 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 255 loading 8 zero partition checkpoints for rank 11 loading 8 zero partition checkpoints for rank 175 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 251 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 229 loading 8 zero partition checkpoints for rank 250 loading 8 zero partition checkpoints for rank 195 loading 8 zero partition checkpoints for rank 173 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 15 loading 8 zero partition checkpoints for rank 239 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 246 loading 8 zero partition checkpoints for rank 172 loading 8 zero partition checkpoints for rank 249 loading 8 zero partition checkpoints for rank 238 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 243 loading 8 zero partition checkpoints for rank 242 loading 8 zero partition checkpoints for rank 174 loading 8 zero partition checkpoints for rank 226 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 18 loading 8 zero partition checkpoints for rank 227 loading 8 zero partition checkpoints for rank 19 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 235 loading 8 zero partition checkpoints for rank 232 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 23 successfully loaded 8 ZeRO state_dicts for rank 6 loading 8 zero partition checkpoints for rank 6 successfully loaded 8 ZeRO state_dicts for rank 7 successfully loaded 8 ZeRO state_dicts for rank 4 successfully loaded 8 ZeRO state_dicts for rank 5 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 04:30:09 CEST)" was missed by 0:00:03.764782 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 7 loading 8 zero partition checkpoints for rank 5 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 6210 time (ms) | load-checkpoint: 91691.46 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-25 04:29:17 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.138486 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.350 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.276 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.080 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-25 04:29:23 done with setup ... training ... time (ms) | model-and-optimizer-setup: 99723.96 | train/valid/test-data-iterators-setup: 5641.98 [before the start of training step] datetime: 2021-09-25 04:29:23 [2021-09-25 04:29:23,929] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-25 04:29:23,930] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-25 04:29:23,930] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-25 04:29:23,930] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-25 04:29:23,930] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 1] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23406.0 | max reserved: 23406.0 [Rank 225] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68994140625 | reserved: 21700.0 | max reserved: 21700.0 [Rank 226] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.6884765625 | reserved: 22492.0 | max reserved: 22492.0 [Rank 2] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23406.0 | max reserved: 23406.0 [Rank 0] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23726.0 | max reserved: 23726.0 [Rank 224] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68896484375 | reserved: 22492.0 | max reserved: 22492.0 [Rank 3] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23374.0 | max reserved: 23374.0 [Rank 227] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68994140625 | reserved: 22492.0 | max reserved: 22492.0 iteration 6220/ 159576 | consumed samples: 194400 | elapsed time per iteration (ms): 18925.1 | learning rate: 5.378E-05 | global batch size: 80 | lm loss: 6.332304E+00 | loss scale: 4096.0 | grad norm: 207900.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [Rank 33] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20130.0 | max reserved: 20130.0 [Rank 97] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19402.0 | max reserved: 19402.0 [Rank 161] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 18826.0 | max reserved: 18826.0 [Rank 193] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0 [Rank 129] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19662.0 | max reserved: 19662.0 [Rank 65] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19946.0 | max reserved: 19946.0 [Rank 34] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20170.0 | max reserved: 20170.0 [Rank 162] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 18826.0 | max reserved: 18826.0 [Rank 130] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19390.0 | max reserved: 19390.0 [Rank 98] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19722.0 | max reserved: 19722.0 [Rank 194] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0 [Rank 66] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20094.0 | max reserved: 20094.0 [Rank 32] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20456.0 | max reserved: 20456.0 [Rank 128] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19908.0 | max reserved: 19908.0 [Rank 96] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19828.0 | max reserved: 19828.0 [Rank 64] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20328.0 | max reserved: 20328.0 [Rank 192] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19396.0 | max reserved: 19396.0 [Rank 160] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19572.0 | max reserved: 19572.0 [Rank 99] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19662.0 | max reserved: 19662.0 [Rank 67] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19966.0 | max reserved: 19966.0 [Rank 131] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19578.0 | max reserved: 19578.0 [Rank 35] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20078.0 | max reserved: 20078.0 [Rank 195] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18842.0 | max reserved: 18842.0 [Rank 163] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19066.0 | max reserved: 19066.0 iteration 6230/ 159576 | consumed samples: 195200 | elapsed time per iteration (ms): 17419.3 | learning rate: 5.400E-05 | global batch size: 80 | lm loss: 6.312761E+00 | loss scale: 4096.0 | grad norm: 102010.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6240/ 159576 | consumed samples: 196000 | elapsed time per iteration (ms): 17458.3 | learning rate: 5.423E-05 | global batch size: 80 | lm loss: 6.325917E+00 | loss scale: 4096.0 | grad norm: 139671.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6250/ 159576 | consumed samples: 196800 | elapsed time per iteration (ms): 17438.0 | learning rate: 5.445E-05 | global batch size: 80 | lm loss: 6.330989E+00 | loss scale: 4096.0 | grad norm: 117429.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6260/ 159576 | consumed samples: 197600 | elapsed time per iteration (ms): 17495.4 | learning rate: 5.467E-05 | global batch size: 80 | lm loss: 6.330341E+00 | loss scale: 4096.0 | grad norm: 101380.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6270/ 159576 | consumed samples: 198400 | elapsed time per iteration (ms): 17488.9 | learning rate: 5.489E-05 | global batch size: 80 | lm loss: 6.304220E+00 | loss scale: 4096.0 | grad norm: 137994.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6280/ 159576 | consumed samples: 199200 | elapsed time per iteration (ms): 17456.9 | learning rate: 5.511E-05 | global batch size: 80 | lm loss: 6.302861E+00 | loss scale: 4096.0 | grad norm: 117645.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6290/ 159576 | consumed samples: 200000 | elapsed time per iteration (ms): 16818.4 | learning rate: 5.531E-05 | global batch size: 80 | lm loss: 6.313686E+00 | loss scale: 4096.0 | grad norm: 87880.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6300/ 159576 | consumed samples: 200800 | elapsed time per iteration (ms): 17519.8 | learning rate: 5.554E-05 | global batch size: 80 | lm loss: 6.270583E+00 | loss scale: 4096.0 | grad norm: 86063.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6310/ 159576 | consumed samples: 201600 | elapsed time per iteration (ms): 17461.4 | learning rate: 5.576E-05 | global batch size: 80 | lm loss: 6.315401E+00 | loss scale: 4096.0 | grad norm: 120394.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6320/ 159576 | consumed samples: 202400 | elapsed time per iteration (ms): 17455.8 | learning rate: 5.598E-05 | global batch size: 80 | lm loss: 6.326277E+00 | loss scale: 4096.0 | grad norm: 95784.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6330/ 159576 | consumed samples: 203200 | elapsed time per iteration (ms): 17431.8 | learning rate: 5.620E-05 | global batch size: 80 | lm loss: 6.333566E+00 | loss scale: 4096.0 | grad norm: 119951.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6340/ 159576 | consumed samples: 204000 | elapsed time per iteration (ms): 16668.3 | learning rate: 5.640E-05 | global batch size: 80 | lm loss: 6.321040E+00 | loss scale: 2048.0 | grad norm: 54351.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 05:08:29] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 05:08:29] PULSE: tr8-104B is running for 41:28 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 6350/ 159576 | consumed samples: 204800 | elapsed time per iteration (ms): 17330.6 | learning rate: 5.662E-05 | global batch size: 80 | lm loss: 6.297153E+00 | loss scale: 2048.0 | grad norm: 61555.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6360/ 159576 | consumed samples: 205600 | elapsed time per iteration (ms): 17390.9 | learning rate: 5.684E-05 | global batch size: 80 | lm loss: 6.296333E+00 | loss scale: 2048.0 | grad norm: 67211.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6370/ 159576 | consumed samples: 206400 | elapsed time per iteration (ms): 17338.2 | learning rate: 5.707E-05 | global batch size: 80 | lm loss: 6.309451E+00 | loss scale: 2048.0 | grad norm: 66671.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6380/ 159576 | consumed samples: 207200 | elapsed time per iteration (ms): 17380.7 | learning rate: 5.729E-05 | global batch size: 80 | lm loss: 6.301356E+00 | loss scale: 2048.0 | grad norm: 45299.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6390/ 159576 | consumed samples: 208000 | elapsed time per iteration (ms): 17366.7 | learning rate: 5.751E-05 | global batch size: 80 | lm loss: 6.335297E+00 | loss scale: 2048.0 | grad norm: 59836.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6400/ 159576 | consumed samples: 208800 | elapsed time per iteration (ms): 17383.7 | learning rate: 5.773E-05 | global batch size: 80 | lm loss: 6.303946E+00 | loss scale: 2048.0 | grad norm: 55594.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6410/ 159576 | consumed samples: 209600 | elapsed time per iteration (ms): 17402.0 | learning rate: 5.795E-05 | global batch size: 80 | lm loss: 6.335719E+00 | loss scale: 2048.0 | grad norm: 63504.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6420/ 159576 | consumed samples: 210400 | elapsed time per iteration (ms): 17371.7 | learning rate: 5.818E-05 | global batch size: 80 | lm loss: 6.278386E+00 | loss scale: 2048.0 | grad norm: 252963.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6430/ 159576 | consumed samples: 211200 | elapsed time per iteration (ms): 17394.4 | learning rate: 5.840E-05 | global batch size: 80 | lm loss: 6.309026E+00 | loss scale: 2048.0 | grad norm: 70987.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6440/ 159576 | consumed samples: 212000 | elapsed time per iteration (ms): 17385.8 | learning rate: 5.862E-05 | global batch size: 80 | lm loss: 6.352011E+00 | loss scale: 2048.0 | grad norm: 57730.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6450/ 159576 | consumed samples: 212800 | elapsed time per iteration (ms): 17363.4 | learning rate: 5.884E-05 | global batch size: 80 | lm loss: 6.338916E+00 | loss scale: 2048.0 | grad norm: 74089.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6460/ 159576 | consumed samples: 213600 | elapsed time per iteration (ms): 17402.1 | learning rate: 5.906E-05 | global batch size: 80 | lm loss: 6.307239E+00 | loss scale: 2048.0 | grad norm: 43748.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6470/ 159576 | consumed samples: 214400 | elapsed time per iteration (ms): 17495.0 | learning rate: 5.929E-05 | global batch size: 80 | lm loss: 6.336151E+00 | loss scale: 2048.0 | grad norm: 39508.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6480/ 159576 | consumed samples: 215200 | elapsed time per iteration (ms): 17462.6 | learning rate: 5.951E-05 | global batch size: 80 | lm loss: 6.356039E+00 | loss scale: 2048.0 | grad norm: 37602.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6490/ 159576 | consumed samples: 216000 | elapsed time per iteration (ms): 17419.0 | learning rate: 5.973E-05 | global batch size: 80 | lm loss: 6.355389E+00 | loss scale: 2048.0 | grad norm: 44833.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6500/ 159576 | consumed samples: 216800 | elapsed time per iteration (ms): 17489.2 | learning rate: 5.995E-05 | global batch size: 80 | lm loss: 6.336482E+00 | loss scale: 2048.0 | grad norm: 54162.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6510/ 159576 | consumed samples: 217600 | elapsed time per iteration (ms): 17458.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.337574E+00 | loss scale: 2048.0 | grad norm: 54595.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6520/ 159576 | consumed samples: 218400 | elapsed time per iteration (ms): 17515.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.356417E+00 | loss scale: 2048.0 | grad norm: 49879.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6530/ 159576 | consumed samples: 219200 | elapsed time per iteration (ms): 17447.6 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.369381E+00 | loss scale: 2048.0 | grad norm: 60963.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6540/ 159576 | consumed samples: 220000 | elapsed time per iteration (ms): 17448.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.338880E+00 | loss scale: 2048.0 | grad norm: 59382.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6550/ 159576 | consumed samples: 220800 | elapsed time per iteration (ms): 17544.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.331310E+00 | loss scale: 2048.0 | grad norm: 62265.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 06:08:34] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 06:08:34] PULSE: tr8-104B is running for 1:41:33 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 6560/ 159576 | consumed samples: 221600 | elapsed time per iteration (ms): 17470.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.312242E+00 | loss scale: 2048.0 | grad norm: 58830.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6570/ 159576 | consumed samples: 222400 | elapsed time per iteration (ms): 17497.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.305868E+00 | loss scale: 2048.0 | grad norm: 95845.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6580/ 159576 | consumed samples: 223200 | elapsed time per iteration (ms): 17465.4 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.323441E+00 | loss scale: 2048.0 | grad norm: 67257.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6590/ 159576 | consumed samples: 224000 | elapsed time per iteration (ms): 17539.4 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.324122E+00 | loss scale: 2048.0 | grad norm: 68019.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6600/ 159576 | consumed samples: 224800 | elapsed time per iteration (ms): 17523.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.367977E+00 | loss scale: 2048.0 | grad norm: 72056.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6610/ 159576 | consumed samples: 225600 | elapsed time per iteration (ms): 17492.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.308113E+00 | loss scale: 2048.0 | grad norm: 149731.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6620/ 159576 | consumed samples: 226400 | elapsed time per iteration (ms): 17537.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.354418E+00 | loss scale: 2048.0 | grad norm: 62412.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6630/ 159576 | consumed samples: 227200 | elapsed time per iteration (ms): 17517.5 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.357222E+00 | loss scale: 2048.0 | grad norm: 85289.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6640/ 159576 | consumed samples: 228000 | elapsed time per iteration (ms): 17515.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.340989E+00 | loss scale: 2048.0 | grad norm: 56974.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6650/ 159576 | consumed samples: 228800 | elapsed time per iteration (ms): 17504.4 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.343948E+00 | loss scale: 2048.0 | grad norm: 94205.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6660/ 159576 | consumed samples: 229600 | elapsed time per iteration (ms): 17528.5 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.349052E+00 | loss scale: 2048.0 | grad norm: 59116.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6670/ 159576 | consumed samples: 230400 | elapsed time per iteration (ms): 17539.0 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.319823E+00 | loss scale: 2048.0 | grad norm: 89145.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6680/ 159576 | consumed samples: 231200 | elapsed time per iteration (ms): 17492.6 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.322467E+00 | loss scale: 2048.0 | grad norm: 79513.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6690/ 159576 | consumed samples: 232000 | elapsed time per iteration (ms): 17427.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.351400E+00 | loss scale: 2048.0 | grad norm: 80270.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6700/ 159576 | consumed samples: 232800 | elapsed time per iteration (ms): 17427.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.321815E+00 | loss scale: 2048.0 | grad norm: 89875.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6710/ 159576 | consumed samples: 233600 | elapsed time per iteration (ms): 17478.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.318744E+00 | loss scale: 2048.0 | grad norm: 75317.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 06:55:50] PULSE: tr8-104B is scheduled to start in 1 day, 10:16:13 (at 2021-09-26T17:12:04) (1188168 on 'gpu_p13' partition) [2021-09-25 06:55:50] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 06:55:50] PULSE: tr8-104B is running for 2:28:49 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 6720/ 159576 | consumed samples: 234400 | elapsed time per iteration (ms): 17509.5 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.297193E+00 | loss scale: 2048.0 | grad norm: 136372.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6730/ 159576 | consumed samples: 235200 | elapsed time per iteration (ms): 17514.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.303332E+00 | loss scale: 2048.0 | grad norm: 84302.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6740/ 159576 | consumed samples: 236000 | elapsed time per iteration (ms): 17530.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.327809E+00 | loss scale: 2048.0 | grad norm: 84736.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6750/ 159576 | consumed samples: 236912 | elapsed time per iteration (ms): 18323.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.320579E+00 | loss scale: 2048.0 | grad norm: 68855.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 07:08:59] PULSE: tr8-104B is scheduled to start in 19:13:17 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition) [2021-09-25 07:08:59] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 07:08:59] PULSE: tr8-104B is running for 2:41:58 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 6760/ 159576 | consumed samples: 237872 | elapsed time per iteration (ms): 18776.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.303013E+00 | loss scale: 2048.0 | grad norm: 69740.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6770/ 159576 | consumed samples: 238832 | elapsed time per iteration (ms): 18675.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.319376E+00 | loss scale: 2048.0 | grad norm: 83900.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6780/ 159576 | consumed samples: 239792 | elapsed time per iteration (ms): 18605.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.336406E+00 | loss scale: 2048.0 | grad norm: 62443.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6790/ 159576 | consumed samples: 240752 | elapsed time per iteration (ms): 18746.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.333478E+00 | loss scale: 2048.0 | grad norm: 73606.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6800/ 159576 | consumed samples: 241712 | elapsed time per iteration (ms): 18688.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.336754E+00 | loss scale: 2048.0 | grad norm: 96323.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6810/ 159576 | consumed samples: 242672 | elapsed time per iteration (ms): 18568.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.315503E+00 | loss scale: 2048.0 | grad norm: 65008.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6820/ 159576 | consumed samples: 243632 | elapsed time per iteration (ms): 18731.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.301308E+00 | loss scale: 2048.0 | grad norm: 70887.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6830/ 159576 | consumed samples: 244592 | elapsed time per iteration (ms): 18612.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.331754E+00 | loss scale: 2048.0 | grad norm: 78393.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6840/ 159576 | consumed samples: 245552 | elapsed time per iteration (ms): 18584.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.318947E+00 | loss scale: 4096.0 | grad norm: 175812.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6850/ 159576 | consumed samples: 246512 | elapsed time per iteration (ms): 18855.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.349559E+00 | loss scale: 4096.0 | grad norm: 150858.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6860/ 159576 | consumed samples: 247472 | elapsed time per iteration (ms): 18778.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.341676E+00 | loss scale: 4096.0 | grad norm: 374400.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6870/ 159576 | consumed samples: 248432 | elapsed time per iteration (ms): 18648.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.313033E+00 | loss scale: 4096.0 | grad norm: 153615.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6880/ 159576 | consumed samples: 249392 | elapsed time per iteration (ms): 18783.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.332200E+00 | loss scale: 4096.0 | grad norm: 135045.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6890/ 159576 | consumed samples: 250352 | elapsed time per iteration (ms): 18757.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.370442E+00 | loss scale: 4096.0 | grad norm: 140003.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6900/ 159576 | consumed samples: 251312 | elapsed time per iteration (ms): 18547.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.426891E+00 | loss scale: 4096.0 | grad norm: 166603.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6910/ 159576 | consumed samples: 252272 | elapsed time per iteration (ms): 18775.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.383529E+00 | loss scale: 4096.0 | grad norm: 161102.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6920/ 159576 | consumed samples: 253232 | elapsed time per iteration (ms): 18674.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.362777E+00 | loss scale: 4096.0 | grad norm: 135239.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6930/ 159576 | consumed samples: 254192 | elapsed time per iteration (ms): 18723.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.348313E+00 | loss scale: 4096.0 | grad norm: 180298.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6940/ 159576 | consumed samples: 255152 | elapsed time per iteration (ms): 18629.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.304693E+00 | loss scale: 4096.0 | grad norm: 155481.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6950/ 159576 | consumed samples: 256112 | elapsed time per iteration (ms): 18736.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.335081E+00 | loss scale: 4096.0 | grad norm: 170157.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 08:09:15] PULSE: tr8-104B is scheduled to start in 18:13:01 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition) [2021-09-25 08:09:15] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 08:09:15] PULSE: tr8-104B is running for 3:42:14 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 6960/ 159576 | consumed samples: 257072 | elapsed time per iteration (ms): 18679.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.350162E+00 | loss scale: 4096.0 | grad norm: 146048.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6970/ 159576 | consumed samples: 258032 | elapsed time per iteration (ms): 17405.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.358824E+00 | loss scale: 2048.0 | grad norm: 83822.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6980/ 159576 | consumed samples: 258992 | elapsed time per iteration (ms): 18714.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.327154E+00 | loss scale: 2048.0 | grad norm: 55012.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6990/ 159576 | consumed samples: 259952 | elapsed time per iteration (ms): 18649.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.344659E+00 | loss scale: 2048.0 | grad norm: 62132.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7000/ 159576 | consumed samples: 260912 | elapsed time per iteration (ms): 18706.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.444662E+00 | loss scale: 2048.0 | grad norm: 98258.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 7000 | lm loss value: 7.174200E+00 | lm loss PPL: 1.305315E+03 | ------------------------------------------------------------------------------------------------ iteration 7010/ 159576 | consumed samples: 261872 | elapsed time per iteration (ms): 19904.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 1.142026E+01 | loss scale: 2048.0 | grad norm: 219645.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7020/ 159576 | consumed samples: 262832 | elapsed time per iteration (ms): 18580.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 1.367010E+01 | loss scale: 2048.0 | grad norm: 223286.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 08:32:28] PULSE: tr8-104B is scheduled to start in 17:49:48 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition) [2021-09-25 08:32:28] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 08:32:28] PULSE: tr8-104B is running for 4:05:27 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7030/ 159576 | consumed samples: 263792 | elapsed time per iteration (ms): 18402.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 1.182180E+01 | loss scale: 2048.0 | grad norm: 19931.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7040/ 159576 | consumed samples: 264752 | elapsed time per iteration (ms): 18461.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 9.981701E+00 | loss scale: 2048.0 | grad norm: 205737.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7050/ 159576 | consumed samples: 265712 | elapsed time per iteration (ms): 18431.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 9.425107E+00 | loss scale: 2048.0 | grad norm: 195793.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7060/ 159576 | consumed samples: 266672 | elapsed time per iteration (ms): 18498.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 8.606621E+00 | loss scale: 2048.0 | grad norm: 50379.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7070/ 159576 | consumed samples: 267632 | elapsed time per iteration (ms): 18340.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 8.027315E+00 | loss scale: 2048.0 | grad norm: 37173.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7080/ 159576 | consumed samples: 268592 | elapsed time per iteration (ms): 18563.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.726066E+00 | loss scale: 2048.0 | grad norm: 22946.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7090/ 159576 | consumed samples: 269552 | elapsed time per iteration (ms): 18408.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.553810E+00 | loss scale: 2048.0 | grad norm: 16048.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7100/ 159576 | consumed samples: 270512 | elapsed time per iteration (ms): 18353.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.394469E+00 | loss scale: 2048.0 | grad norm: 10766.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 08:57:55] PULSE: tr8-104B is scheduled to start in 17:24:21 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition) [2021-09-25 08:57:55] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 08:57:55] PULSE: tr8-104B is running for 4:30:54 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7110/ 159576 | consumed samples: 271472 | elapsed time per iteration (ms): 18511.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.327065E+00 | loss scale: 2048.0 | grad norm: 25940.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7120/ 159576 | consumed samples: 272432 | elapsed time per iteration (ms): 18333.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.337917E+00 | loss scale: 2048.0 | grad norm: 18319.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7130/ 159576 | consumed samples: 273392 | elapsed time per iteration (ms): 18249.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.273988E+00 | loss scale: 2048.0 | grad norm: 14331.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7140/ 159576 | consumed samples: 274352 | elapsed time per iteration (ms): 18274.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.204887E+00 | loss scale: 2048.0 | grad norm: 21767.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 09:09:21] PULSE: tr8-104B is scheduled to start in 17:12:55 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition) [2021-09-25 09:09:21] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 09:09:21] PULSE: tr8-104B is running for 4:42:20 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7150/ 159576 | consumed samples: 275312 | elapsed time per iteration (ms): 18318.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.195872E+00 | loss scale: 2048.0 | grad norm: 14010.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7160/ 159576 | consumed samples: 276272 | elapsed time per iteration (ms): 18337.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.136990E+00 | loss scale: 2048.0 | grad norm: 23189.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7170/ 159576 | consumed samples: 277232 | elapsed time per iteration (ms): 18344.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.222323E+00 | loss scale: 2048.0 | grad norm: 22610.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7180/ 159576 | consumed samples: 278192 | elapsed time per iteration (ms): 18312.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.156533E+00 | loss scale: 2048.0 | grad norm: 12376.987 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7190/ 159576 | consumed samples: 279152 | elapsed time per iteration (ms): 18417.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.084262E+00 | loss scale: 2048.0 | grad norm: 38647.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7200/ 159576 | consumed samples: 280112 | elapsed time per iteration (ms): 18396.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.110893E+00 | loss scale: 2048.0 | grad norm: 21520.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7210/ 159576 | consumed samples: 281072 | elapsed time per iteration (ms): 18408.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.294872E+00 | loss scale: 2048.0 | grad norm: 77171.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7220/ 159576 | consumed samples: 282032 | elapsed time per iteration (ms): 18333.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.155109E+00 | loss scale: 2048.0 | grad norm: 16921.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7230/ 159576 | consumed samples: 282992 | elapsed time per iteration (ms): 18398.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.042103E+00 | loss scale: 2048.0 | grad norm: 13510.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7240/ 159576 | consumed samples: 284032 | elapsed time per iteration (ms): 19100.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.964984E+00 | loss scale: 2048.0 | grad norm: 11355.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7250/ 159576 | consumed samples: 285152 | elapsed time per iteration (ms): 19781.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.051522E+00 | loss scale: 2048.0 | grad norm: 14836.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7260/ 159576 | consumed samples: 286272 | elapsed time per iteration (ms): 19836.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.050404E+00 | loss scale: 2048.0 | grad norm: 32092.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7270/ 159576 | consumed samples: 287392 | elapsed time per iteration (ms): 19719.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.034865E+00 | loss scale: 2048.0 | grad norm: 25809.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7280/ 159576 | consumed samples: 288512 | elapsed time per iteration (ms): 19632.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.038512E+00 | loss scale: 2048.0 | grad norm: 19816.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7290/ 159576 | consumed samples: 289632 | elapsed time per iteration (ms): 19704.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.051814E+00 | loss scale: 2048.0 | grad norm: 13138.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7300/ 159576 | consumed samples: 290752 | elapsed time per iteration (ms): 19431.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.962708E+00 | loss scale: 2048.0 | grad norm: 15505.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7310/ 159576 | consumed samples: 291872 | elapsed time per iteration (ms): 19625.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.068867E+00 | loss scale: 2048.0 | grad norm: 26542.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7320/ 159576 | consumed samples: 292992 | elapsed time per iteration (ms): 19705.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.131171E+00 | loss scale: 2048.0 | grad norm: 59185.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7330/ 159576 | consumed samples: 294112 | elapsed time per iteration (ms): 19592.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.030576E+00 | loss scale: 2048.0 | grad norm: 32033.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 10:09:39] PULSE: tr8-104B is scheduled to start in 17:07:05 (at 2021-09-26T03:16:45) (1188168 on 'gpu_p13' partition) [2021-09-25 10:09:39] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 10:09:39] PULSE: tr8-104B is running for 5:42:38 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7340/ 159576 | consumed samples: 295232 | elapsed time per iteration (ms): 19566.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.981178E+00 | loss scale: 2048.0 | grad norm: 29317.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7350/ 159576 | consumed samples: 296352 | elapsed time per iteration (ms): 19494.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.969751E+00 | loss scale: 2048.0 | grad norm: 20774.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7360/ 159576 | consumed samples: 297472 | elapsed time per iteration (ms): 19789.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.939532E+00 | loss scale: 2048.0 | grad norm: 22939.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7370/ 159576 | consumed samples: 298592 | elapsed time per iteration (ms): 19854.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.888672E+00 | loss scale: 2048.0 | grad norm: 30762.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7380/ 159576 | consumed samples: 299712 | elapsed time per iteration (ms): 19888.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.906486E+00 | loss scale: 2048.0 | grad norm: 18438.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7390/ 159576 | consumed samples: 300832 | elapsed time per iteration (ms): 19703.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.877617E+00 | loss scale: 2048.0 | grad norm: 15185.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7400/ 159576 | consumed samples: 301952 | elapsed time per iteration (ms): 19654.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.854189E+00 | loss scale: 2048.0 | grad norm: 15960.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7410/ 159576 | consumed samples: 303072 | elapsed time per iteration (ms): 19528.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.894382E+00 | loss scale: 2048.0 | grad norm: 12842.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7420/ 159576 | consumed samples: 304192 | elapsed time per iteration (ms): 19701.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.860787E+00 | loss scale: 2048.0 | grad norm: 15167.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7430/ 159576 | consumed samples: 305312 | elapsed time per iteration (ms): 19702.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.859363E+00 | loss scale: 2048.0 | grad norm: 23062.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7440/ 159576 | consumed samples: 306432 | elapsed time per iteration (ms): 19933.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.860333E+00 | loss scale: 2048.0 | grad norm: 32840.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7450/ 159576 | consumed samples: 307552 | elapsed time per iteration (ms): 19857.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.824039E+00 | loss scale: 2048.0 | grad norm: 14512.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7460/ 159576 | consumed samples: 308672 | elapsed time per iteration (ms): 19438.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.828743E+00 | loss scale: 2048.0 | grad norm: 22065.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7470/ 159576 | consumed samples: 309792 | elapsed time per iteration (ms): 19647.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.799754E+00 | loss scale: 4096.0 | grad norm: 49640.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7480/ 159576 | consumed samples: 310912 | elapsed time per iteration (ms): 19818.5 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.815539E+00 | loss scale: 4096.0 | grad norm: 22148.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7490/ 159576 | consumed samples: 312032 | elapsed time per iteration (ms): 19788.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.894387E+00 | loss scale: 4096.0 | grad norm: 36912.117 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7500/ 159576 | consumed samples: 313152 | elapsed time per iteration (ms): 19799.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.841101E+00 | loss scale: 4096.0 | grad norm: 23983.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-25 11:03:46,249] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step7500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 18021.67 iteration 7510/ 159576 | consumed samples: 314272 | elapsed time per iteration (ms): 21444.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.821138E+00 | loss scale: 4096.0 | grad norm: 27340.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 11:09:42] PULSE: tr8-104B is scheduled to start in 17:10:43 (at 2021-09-26T04:20:26) (1188168 on 'gpu_p13' partition) [2021-09-25 11:09:42] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 11:09:42] PULSE: tr8-104B is running for 6:42:41 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7520/ 159576 | consumed samples: 315392 | elapsed time per iteration (ms): 19669.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.839085E+00 | loss scale: 4096.0 | grad norm: 27168.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7530/ 159576 | consumed samples: 316512 | elapsed time per iteration (ms): 19673.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.866766E+00 | loss scale: 4096.0 | grad norm: 35661.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7540/ 159576 | consumed samples: 317632 | elapsed time per iteration (ms): 19547.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.895227E+00 | loss scale: 4096.0 | grad norm: 30950.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7550/ 159576 | consumed samples: 318752 | elapsed time per iteration (ms): 19728.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.974333E+00 | loss scale: 4096.0 | grad norm: 58146.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7560/ 159576 | consumed samples: 319872 | elapsed time per iteration (ms): 19670.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.993269E+00 | loss scale: 4096.0 | grad norm: 59358.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7570/ 159576 | consumed samples: 320992 | elapsed time per iteration (ms): 19932.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.018776E+00 | loss scale: 4096.0 | grad norm: 26693.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7580/ 159576 | consumed samples: 322112 | elapsed time per iteration (ms): 19801.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.954316E+00 | loss scale: 4096.0 | grad norm: 56910.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7590/ 159576 | consumed samples: 323232 | elapsed time per iteration (ms): 19757.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.019042E+00 | loss scale: 4096.0 | grad norm: 31511.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7600/ 159576 | consumed samples: 324352 | elapsed time per iteration (ms): 19717.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.002568E+00 | loss scale: 4096.0 | grad norm: 35214.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7610/ 159576 | consumed samples: 325472 | elapsed time per iteration (ms): 19801.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.968073E+00 | loss scale: 4096.0 | grad norm: 40886.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7620/ 159576 | consumed samples: 326592 | elapsed time per iteration (ms): 19491.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.959355E+00 | loss scale: 4096.0 | grad norm: 37865.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7630/ 159576 | consumed samples: 327712 | elapsed time per iteration (ms): 19606.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.927076E+00 | loss scale: 4096.0 | grad norm: 32908.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7640/ 159576 | consumed samples: 328832 | elapsed time per iteration (ms): 19669.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.079063E+00 | loss scale: 4096.0 | grad norm: 43561.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7650/ 159576 | consumed samples: 329952 | elapsed time per iteration (ms): 19813.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.977676E+00 | loss scale: 4096.0 | grad norm: 33954.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7660/ 159576 | consumed samples: 331120 | elapsed time per iteration (ms): 20182.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.071407E+00 | loss scale: 4096.0 | grad norm: 139629.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7670/ 159576 | consumed samples: 332400 | elapsed time per iteration (ms): 20921.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.133433E+00 | loss scale: 4096.0 | grad norm: 151598.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7680/ 159576 | consumed samples: 333680 | elapsed time per iteration (ms): 20923.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.093058E+00 | loss scale: 4096.0 | grad norm: 75854.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7690/ 159576 | consumed samples: 334960 | elapsed time per iteration (ms): 20468.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.040206E+00 | loss scale: 4096.0 | grad norm: 68735.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 12:10:01] PULSE: tr8-104B is scheduled to start in 18:54:29 (at 2021-09-26T07:04:31) (1188168 on 'gpu_p13' partition) [2021-09-25 12:10:01] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 12:10:01] PULSE: tr8-104B is running for 7:43:00 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7700/ 159576 | consumed samples: 336240 | elapsed time per iteration (ms): 20712.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.991071E+00 | loss scale: 4096.0 | grad norm: 49058.974 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7710/ 159576 | consumed samples: 337520 | elapsed time per iteration (ms): 20803.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.999660E+00 | loss scale: 4096.0 | grad norm: 50810.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7720/ 159576 | consumed samples: 338800 | elapsed time per iteration (ms): 21027.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.148920E+00 | loss scale: 4096.0 | grad norm: 34526.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7730/ 159576 | consumed samples: 340080 | elapsed time per iteration (ms): 20621.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.952879E+00 | loss scale: 4096.0 | grad norm: 46587.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7740/ 159576 | consumed samples: 341360 | elapsed time per iteration (ms): 20787.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.077150E+00 | loss scale: 4096.0 | grad norm: 53834.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7750/ 159576 | consumed samples: 342640 | elapsed time per iteration (ms): 20790.5 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.024051E+00 | loss scale: 4096.0 | grad norm: 108296.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7760/ 159576 | consumed samples: 343920 | elapsed time per iteration (ms): 20756.3 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.185934E+00 | loss scale: 4096.0 | grad norm: 40243.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7770/ 159576 | consumed samples: 345200 | elapsed time per iteration (ms): 20678.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.155985E+00 | loss scale: 4096.0 | grad norm: 45818.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7780/ 159576 | consumed samples: 346480 | elapsed time per iteration (ms): 20656.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.028696E+00 | loss scale: 4096.0 | grad norm: 54814.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7790/ 159576 | consumed samples: 347760 | elapsed time per iteration (ms): 20773.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.962093E+00 | loss scale: 4096.0 | grad norm: 57105.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7800/ 159576 | consumed samples: 349040 | elapsed time per iteration (ms): 20735.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.054767E+00 | loss scale: 4096.0 | grad norm: 74767.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7810/ 159576 | consumed samples: 350320 | elapsed time per iteration (ms): 20748.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.948767E+00 | loss scale: 4096.0 | grad norm: 103822.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7820/ 159576 | consumed samples: 351600 | elapsed time per iteration (ms): 20609.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.995116E+00 | loss scale: 4096.0 | grad norm: 70594.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7830/ 159576 | consumed samples: 352880 | elapsed time per iteration (ms): 20891.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.140380E+00 | loss scale: 4096.0 | grad norm: 50257.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7840/ 159576 | consumed samples: 354160 | elapsed time per iteration (ms): 20736.5 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.051595E+00 | loss scale: 4096.0 | grad norm: 62967.110 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7850/ 159576 | consumed samples: 355440 | elapsed time per iteration (ms): 20790.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.921895E+00 | loss scale: 4096.0 | grad norm: 104168.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7860/ 159576 | consumed samples: 356720 | elapsed time per iteration (ms): 20774.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.071528E+00 | loss scale: 4096.0 | grad norm: 193610.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7870/ 159576 | consumed samples: 358000 | elapsed time per iteration (ms): 20837.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.086633E+00 | loss scale: 4096.0 | grad norm: 56330.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 13:10:06] PULSE: tr8-104B is scheduled to start in 17:54:24 (at 2021-09-26T07:04:31) (1188168 on 'gpu_p13' partition) [2021-09-25 13:10:06] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 13:10:06] PULSE: tr8-104B is running for 8:43:05 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7880/ 159576 | consumed samples: 359280 | elapsed time per iteration (ms): 20746.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.156522E+00 | loss scale: 4096.0 | grad norm: 137295.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7890/ 159576 | consumed samples: 360560 | elapsed time per iteration (ms): 20983.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.996352E+00 | loss scale: 4096.0 | grad norm: 67763.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7900/ 159576 | consumed samples: 361840 | elapsed time per iteration (ms): 20640.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.985654E+00 | loss scale: 4096.0 | grad norm: 113013.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7910/ 159576 | consumed samples: 363120 | elapsed time per iteration (ms): 20742.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.976338E+00 | loss scale: 4096.0 | grad norm: 73140.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7920/ 159576 | consumed samples: 364400 | elapsed time per iteration (ms): 20679.4 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.917073E+00 | loss scale: 4096.0 | grad norm: 83861.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7930/ 159576 | consumed samples: 365680 | elapsed time per iteration (ms): 20531.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.971965E+00 | loss scale: 4096.0 | grad norm: 57978.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7940/ 159576 | consumed samples: 366960 | elapsed time per iteration (ms): 20446.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.117603E+00 | loss scale: 4096.0 | grad norm: 218144.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7950/ 159576 | consumed samples: 368240 | elapsed time per iteration (ms): 20823.5 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.029739E+00 | loss scale: 4096.0 | grad norm: 46987.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7960/ 159576 | consumed samples: 369520 | elapsed time per iteration (ms): 20775.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.972835E+00 | loss scale: 4096.0 | grad norm: 59193.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7970/ 159576 | consumed samples: 370800 | elapsed time per iteration (ms): 20508.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.890491E+00 | loss scale: 8192.0 | grad norm: 102786.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7980/ 159576 | consumed samples: 372080 | elapsed time per iteration (ms): 20983.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.927078E+00 | loss scale: 8192.0 | grad norm: 117997.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7990/ 159576 | consumed samples: 373360 | elapsed time per iteration (ms): 20495.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.823578E+00 | loss scale: 8192.0 | grad norm: 123947.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 13:53:58,625] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=17, lr=[5.999979430007177e-05, 5.999979430007177e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 8000 loss: 6.8207 iter time (s): 0.010 samples/sec: 13060.948 iteration 8000/ 159576 | consumed samples: 374640 | elapsed time per iteration (ms): 20659.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.884979E+00 | loss scale: 8192.0 | grad norm: 131468.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 8000 | lm loss value: 6.791678E+00 | lm loss PPL: 8.904064E+02 | ------------------------------------------------------------------------------------------------ iteration 8010/ 159576 | consumed samples: 375920 | elapsed time per iteration (ms): 22008.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.826038E+00 | loss scale: 8192.0 | grad norm: 154245.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8020/ 159576 | consumed samples: 377200 | elapsed time per iteration (ms): 20587.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.870419E+00 | loss scale: 8192.0 | grad norm: 129858.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8030/ 159576 | consumed samples: 378544 | elapsed time per iteration (ms): 21288.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.928481E+00 | loss scale: 8192.0 | grad norm: 226677.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8040/ 159576 | consumed samples: 379984 | elapsed time per iteration (ms): 21881.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.896291E+00 | loss scale: 8192.0 | grad norm: 205623.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 14:10:08] PULSE: tr8-104B is scheduled to start in 17:26:04 (at 2021-09-26T07:36:13) (1188168 on 'gpu_p13' partition) [2021-09-25 14:10:08] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 14:10:08] PULSE: tr8-104B is running for 9:43:07 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8050/ 159576 | consumed samples: 381424 | elapsed time per iteration (ms): 21696.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.873873E+00 | loss scale: 8192.0 | grad norm: 146153.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8060/ 159576 | consumed samples: 382864 | elapsed time per iteration (ms): 21810.7 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.853185E+00 | loss scale: 8192.0 | grad norm: 101607.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8070/ 159576 | consumed samples: 384304 | elapsed time per iteration (ms): 21802.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.850246E+00 | loss scale: 8192.0 | grad norm: 139070.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8080/ 159576 | consumed samples: 385744 | elapsed time per iteration (ms): 21831.7 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.848817E+00 | loss scale: 8192.0 | grad norm: 129639.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8090/ 159576 | consumed samples: 387184 | elapsed time per iteration (ms): 21715.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.856639E+00 | loss scale: 8192.0 | grad norm: 200364.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8100/ 159576 | consumed samples: 388624 | elapsed time per iteration (ms): 21801.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.869398E+00 | loss scale: 8192.0 | grad norm: 141893.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8110/ 159576 | consumed samples: 390064 | elapsed time per iteration (ms): 21693.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.834469E+00 | loss scale: 8192.0 | grad norm: 133792.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8120/ 159576 | consumed samples: 391504 | elapsed time per iteration (ms): 21798.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.845126E+00 | loss scale: 8192.0 | grad norm: 196465.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8130/ 159576 | consumed samples: 392944 | elapsed time per iteration (ms): 21718.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.864041E+00 | loss scale: 8192.0 | grad norm: 234002.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8140/ 159576 | consumed samples: 394384 | elapsed time per iteration (ms): 20974.7 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.866895E+00 | loss scale: 8192.0 | grad norm: 214792.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8150/ 159576 | consumed samples: 395824 | elapsed time per iteration (ms): 20962.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.949483E+00 | loss scale: 4096.0 | grad norm: 129105.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8160/ 159576 | consumed samples: 397264 | elapsed time per iteration (ms): 21839.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.982524E+00 | loss scale: 4096.0 | grad norm: 104094.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8170/ 159576 | consumed samples: 398704 | elapsed time per iteration (ms): 21626.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.968035E+00 | loss scale: 4096.0 | grad norm: 85705.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8180/ 159576 | consumed samples: 400144 | elapsed time per iteration (ms): 21733.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.983526E+00 | loss scale: 4096.0 | grad norm: 140563.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8190/ 159576 | consumed samples: 401584 | elapsed time per iteration (ms): 21768.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.016048E+00 | loss scale: 4096.0 | grad norm: 72531.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8200/ 159576 | consumed samples: 403024 | elapsed time per iteration (ms): 21929.8 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.996774E+00 | loss scale: 4096.0 | grad norm: 128628.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8210/ 159576 | consumed samples: 404464 | elapsed time per iteration (ms): 21876.8 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.954953E+00 | loss scale: 4096.0 | grad norm: 114237.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 15:10:12] PULSE: tr8-104B is scheduled to start in 20:25:18 (at 2021-09-26T11:35:31) (1188168 on 'gpu_p13' partition) [2021-09-25 15:10:12] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 15:10:12] PULSE: tr8-104B is running for 10:43:11 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8220/ 159576 | consumed samples: 405904 | elapsed time per iteration (ms): 21992.9 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.927856E+00 | loss scale: 4096.0 | grad norm: 191859.936 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8230/ 159576 | consumed samples: 407344 | elapsed time per iteration (ms): 21845.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.915263E+00 | loss scale: 4096.0 | grad norm: 136325.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8240/ 159576 | consumed samples: 408784 | elapsed time per iteration (ms): 21179.2 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.864025E+00 | loss scale: 2048.0 | grad norm: 118355.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8250/ 159576 | consumed samples: 410224 | elapsed time per iteration (ms): 21688.2 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.873029E+00 | loss scale: 2048.0 | grad norm: 72612.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8260/ 159576 | consumed samples: 411664 | elapsed time per iteration (ms): 21621.0 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.963725E+00 | loss scale: 2048.0 | grad norm: 77677.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8270/ 159576 | consumed samples: 413104 | elapsed time per iteration (ms): 21832.0 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.939199E+00 | loss scale: 2048.0 | grad norm: 80021.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8280/ 159576 | consumed samples: 414544 | elapsed time per iteration (ms): 21967.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.919482E+00 | loss scale: 2048.0 | grad norm: 58905.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8290/ 159576 | consumed samples: 415984 | elapsed time per iteration (ms): 21671.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.919662E+00 | loss scale: 2048.0 | grad norm: 52571.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8300/ 159576 | consumed samples: 417424 | elapsed time per iteration (ms): 21755.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.024297E+00 | loss scale: 2048.0 | grad norm: 77079.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8310/ 159576 | consumed samples: 418864 | elapsed time per iteration (ms): 21909.8 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.234490E+00 | loss scale: 2048.0 | grad norm: 102216.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8320/ 159576 | consumed samples: 420304 | elapsed time per iteration (ms): 21566.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.228243E+00 | loss scale: 2048.0 | grad norm: 88135.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8330/ 159576 | consumed samples: 421744 | elapsed time per iteration (ms): 22069.0 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.068048E+00 | loss scale: 2048.0 | grad norm: 65341.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8340/ 159576 | consumed samples: 423184 | elapsed time per iteration (ms): 21682.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.049673E+00 | loss scale: 2048.0 | grad norm: 45586.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8350/ 159576 | consumed samples: 424624 | elapsed time per iteration (ms): 21918.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.033588E+00 | loss scale: 2048.0 | grad norm: 60230.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8360/ 159576 | consumed samples: 426160 | elapsed time per iteration (ms): 22474.7 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.032515E+00 | loss scale: 2048.0 | grad norm: 55714.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8370/ 159576 | consumed samples: 427760 | elapsed time per iteration (ms): 22723.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.051062E+00 | loss scale: 2048.0 | grad norm: 68784.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 16:10:22] PULSE: tr8-104B is scheduled to start in 19:16:12 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition) [2021-09-25 16:10:22] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 16:10:22] PULSE: tr8-104B is running for 11:43:21 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8380/ 159576 | consumed samples: 429360 | elapsed time per iteration (ms): 22974.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.025337E+00 | loss scale: 2048.0 | grad norm: 89725.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8390/ 159576 | consumed samples: 430960 | elapsed time per iteration (ms): 22266.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.010270E+00 | loss scale: 1024.0 | grad norm: 33629.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8400/ 159576 | consumed samples: 432560 | elapsed time per iteration (ms): 22964.2 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.020833E+00 | loss scale: 1024.0 | grad norm: 46812.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8410/ 159576 | consumed samples: 434160 | elapsed time per iteration (ms): 22923.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.044554E+00 | loss scale: 1024.0 | grad norm: 55335.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8420/ 159576 | consumed samples: 435760 | elapsed time per iteration (ms): 22690.3 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.074860E+00 | loss scale: 1024.0 | grad norm: 27018.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8430/ 159576 | consumed samples: 437360 | elapsed time per iteration (ms): 22997.6 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.108445E+00 | loss scale: 1024.0 | grad norm: 95058.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8440/ 159576 | consumed samples: 438960 | elapsed time per iteration (ms): 22696.4 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.128921E+00 | loss scale: 1024.0 | grad norm: 44470.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8450/ 159576 | consumed samples: 440560 | elapsed time per iteration (ms): 22728.4 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.037349E+00 | loss scale: 1024.0 | grad norm: 32995.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8460/ 159576 | consumed samples: 442160 | elapsed time per iteration (ms): 22856.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.064864E+00 | loss scale: 1024.0 | grad norm: 23093.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8470/ 159576 | consumed samples: 443760 | elapsed time per iteration (ms): 22824.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.057752E+00 | loss scale: 1024.0 | grad norm: 34580.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8480/ 159576 | consumed samples: 445360 | elapsed time per iteration (ms): 22939.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.111783E+00 | loss scale: 1024.0 | grad norm: 30415.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8490/ 159576 | consumed samples: 446960 | elapsed time per iteration (ms): 22647.3 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.077787E+00 | loss scale: 1024.0 | grad norm: 44228.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8500/ 159576 | consumed samples: 448560 | elapsed time per iteration (ms): 22870.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.017307E+00 | loss scale: 1024.0 | grad norm: 31106.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 17:00:02] PULSE: tr8-104B is scheduled to start in 18:26:32 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition) [2021-09-25 17:00:02] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 17:00:02] PULSE: tr8-104B is running for 12:33:01 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8510/ 159576 | consumed samples: 450160 | elapsed time per iteration (ms): 22836.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.033496E+00 | loss scale: 1024.0 | grad norm: 84589.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8520/ 159576 | consumed samples: 451760 | elapsed time per iteration (ms): 22678.6 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.034415E+00 | loss scale: 1024.0 | grad norm: 45889.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8530/ 159576 | consumed samples: 453360 | elapsed time per iteration (ms): 22820.3 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.022775E+00 | loss scale: 1024.0 | grad norm: 46421.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 17:10:31] PULSE: tr8-104B is scheduled to start in 18:16:03 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition) [2021-09-25 17:10:31] PULSE: tr8-104B is running for 12:43:30 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8540/ 159576 | consumed samples: 454960 | elapsed time per iteration (ms): 22803.2 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.015056E+00 | loss scale: 1024.0 | grad norm: 49138.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8550/ 159576 | consumed samples: 456560 | elapsed time per iteration (ms): 22969.4 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.037695E+00 | loss scale: 1024.0 | grad norm: 72675.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8560/ 159576 | consumed samples: 458160 | elapsed time per iteration (ms): 22624.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.040105E+00 | loss scale: 1024.0 | grad norm: 55417.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8570/ 159576 | consumed samples: 459760 | elapsed time per iteration (ms): 22663.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.066528E+00 | loss scale: 1024.0 | grad norm: 48492.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 17:26:58] PULSE: tr8-104B is scheduled to start in 17:59:36 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition) [2021-09-25 17:26:58] PULSE: tr8-104B is running for 12:59:57 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8580/ 159576 | consumed samples: 461360 | elapsed time per iteration (ms): 22688.8 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.087028E+00 | loss scale: 1024.0 | grad norm: 46974.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8590/ 159576 | consumed samples: 462960 | elapsed time per iteration (ms): 22699.4 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.089204E+00 | loss scale: 1024.0 | grad norm: 44702.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8600/ 159576 | consumed samples: 464560 | elapsed time per iteration (ms): 22777.7 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.149306E+00 | loss scale: 1024.0 | grad norm: 261339.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8610/ 159576 | consumed samples: 466160 | elapsed time per iteration (ms): 22975.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.167276E+00 | loss scale: 1024.0 | grad norm: 105455.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8620/ 159576 | consumed samples: 467760 | elapsed time per iteration (ms): 23048.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.078442E+00 | loss scale: 1024.0 | grad norm: 84212.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8630/ 159576 | consumed samples: 469360 | elapsed time per iteration (ms): 22799.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.081234E+00 | loss scale: 1024.0 | grad norm: 52121.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8640/ 159576 | consumed samples: 470960 | elapsed time per iteration (ms): 22720.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.109283E+00 | loss scale: 1024.0 | grad norm: 48651.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8650/ 159576 | consumed samples: 472560 | elapsed time per iteration (ms): 22695.2 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.118199E+00 | loss scale: 1024.0 | grad norm: 26046.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8660/ 159576 | consumed samples: 474320 | elapsed time per iteration (ms): 23933.5 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.064212E+00 | loss scale: 1024.0 | grad norm: 40523.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8670/ 159576 | consumed samples: 476080 | elapsed time per iteration (ms): 23798.1 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.051229E+00 | loss scale: 1024.0 | grad norm: 28160.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8680/ 159576 | consumed samples: 477840 | elapsed time per iteration (ms): 23923.9 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.036906E+00 | loss scale: 1024.0 | grad norm: 51047.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8690/ 159576 | consumed samples: 479600 | elapsed time per iteration (ms): 23651.1 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.073657E+00 | loss scale: 1024.0 | grad norm: 141610.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 18:10:35] PULSE: tr8-104B is scheduled to start in 17:15:59 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition) [2021-09-25 18:10:35] PULSE: tr8-104B is running for 13:43:34 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8700/ 159576 | consumed samples: 481360 | elapsed time per iteration (ms): 23943.4 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.071510E+00 | loss scale: 1024.0 | grad norm: 24381.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8710/ 159576 | consumed samples: 483120 | elapsed time per iteration (ms): 23910.3 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.190697E+00 | loss scale: 1024.0 | grad norm: 41525.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8720/ 159576 | consumed samples: 484880 | elapsed time per iteration (ms): 23923.5 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.332158E+00 | loss scale: 1024.0 | grad norm: 23580.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8730/ 159576 | consumed samples: 486640 | elapsed time per iteration (ms): 23664.9 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.250137E+00 | loss scale: 1024.0 | grad norm: 33934.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8740/ 159576 | consumed samples: 488400 | elapsed time per iteration (ms): 24002.8 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.134158E+00 | loss scale: 1024.0 | grad norm: 18917.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8750/ 159576 | consumed samples: 490160 | elapsed time per iteration (ms): 23812.9 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.133132E+00 | loss scale: 1024.0 | grad norm: 24524.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8760/ 159576 | consumed samples: 491920 | elapsed time per iteration (ms): 24164.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.089709E+00 | loss scale: 1024.0 | grad norm: 18466.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8770/ 159576 | consumed samples: 493680 | elapsed time per iteration (ms): 23763.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.075866E+00 | loss scale: 1024.0 | grad norm: 21160.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8780/ 159576 | consumed samples: 495440 | elapsed time per iteration (ms): 23757.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.105405E+00 | loss scale: 1024.0 | grad norm: 21012.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8790/ 159576 | consumed samples: 497200 | elapsed time per iteration (ms): 23726.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.119524E+00 | loss scale: 1024.0 | grad norm: 19184.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 18:51:17] PULSE: tr8-104B is scheduled to start in 19:55:07 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition) [2021-09-25 18:51:17] PULSE: tr8-104B is running for 14:24:16 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8800/ 159576 | consumed samples: 498960 | elapsed time per iteration (ms): 23872.5 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.150304E+00 | loss scale: 1024.0 | grad norm: 20582.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8810/ 159576 | consumed samples: 500720 | elapsed time per iteration (ms): 23674.3 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.121466E+00 | loss scale: 1024.0 | grad norm: 26026.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8820/ 159576 | consumed samples: 502480 | elapsed time per iteration (ms): 23655.3 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.227619E+00 | loss scale: 1024.0 | grad norm: 19493.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8830/ 159576 | consumed samples: 504240 | elapsed time per iteration (ms): 24040.7 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.202127E+00 | loss scale: 1024.0 | grad norm: 21130.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8840/ 159576 | consumed samples: 506000 | elapsed time per iteration (ms): 23751.6 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.102602E+00 | loss scale: 1024.0 | grad norm: 15258.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 19:10:38] PULSE: tr8-104B is scheduled to start in 19:35:46 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition) [2021-09-25 19:10:38] PULSE: tr8-104B is running for 14:43:37 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8850/ 159576 | consumed samples: 507760 | elapsed time per iteration (ms): 23681.3 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.106478E+00 | loss scale: 1024.0 | grad norm: 15650.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8860/ 159576 | consumed samples: 509520 | elapsed time per iteration (ms): 23830.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.077826E+00 | loss scale: 1024.0 | grad norm: 13271.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8870/ 159576 | consumed samples: 511280 | elapsed time per iteration (ms): 23830.3 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.083195E+00 | loss scale: 1024.0 | grad norm: 13942.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8880/ 159576 | consumed samples: 513040 | elapsed time per iteration (ms): 23893.7 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.101151E+00 | loss scale: 1024.0 | grad norm: 17666.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8890/ 159576 | consumed samples: 514800 | elapsed time per iteration (ms): 23733.4 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.130984E+00 | loss scale: 2048.0 | grad norm: 41179.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8900/ 159576 | consumed samples: 516560 | elapsed time per iteration (ms): 23693.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.084023E+00 | loss scale: 2048.0 | grad norm: 32703.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8910/ 159576 | consumed samples: 518320 | elapsed time per iteration (ms): 23793.1 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.094463E+00 | loss scale: 2048.0 | grad norm: 46954.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8920/ 159576 | consumed samples: 520112 | elapsed time per iteration (ms): 23988.6 | learning rate: 6.000E-05 | global batch size: 192 | lm loss: 7.094890E+00 | loss scale: 2048.0 | grad norm: 20910.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8930/ 159576 | consumed samples: 522032 | elapsed time per iteration (ms): 24780.5 | learning rate: 6.000E-05 | global batch size: 192 | lm loss: 7.112840E+00 | loss scale: 2048.0 | grad norm: 23723.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8940/ 159576 | consumed samples: 523952 | elapsed time per iteration (ms): 24880.9 | learning rate: 6.000E-05 | global batch size: 192 | lm loss: 7.157214E+00 | loss scale: 2048.0 | grad norm: 35769.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8950/ 159576 | consumed samples: 525872 | elapsed time per iteration (ms): 24820.3 | learning rate: 6.000E-05 | global batch size: 192 | lm loss: 7.212303E+00 | loss scale: 2048.0 | grad norm: 20241.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8960/ 159576 | consumed samples: 527792 | elapsed time per iteration (ms): 24706.7 | learning rate: 6.000E-05 | global batch size: 192 | lm loss: 7.215181E+00 | loss scale: 2048.0 | grad norm: 48969.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8970/ 159576 | consumed samples: 529712 | elapsed time per iteration (ms): 23528.3 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1024.0 | grad norm: 156762.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8980/ 159576 | consumed samples: 531632 | elapsed time per iteration (ms): 18302.5 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 2.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8990/ 159576 | consumed samples: 533552 | elapsed time per iteration (ms): 17645.0 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 20:10:52] PULSE: tr8-104B is scheduled to start in 18:35:32 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition) [2021-09-25 20:10:52] PULSE: tr8-104B is running for 15:43:51 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 9000/ 159576 | consumed samples: 535472 | elapsed time per iteration (ms): 17316.3 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 9000 | lm loss value: 7.256732E+00 | lm loss PPL: 1.417617E+03 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 9000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-25 20:11:32,719] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step9000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 9000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17709.49 iteration 9010/ 159576 | consumed samples: 537392 | elapsed time per iteration (ms): 21623.6 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9020/ 159576 | consumed samples: 539312 | elapsed time per iteration (ms): 17559.0 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9030/ 159576 | consumed samples: 541232 | elapsed time per iteration (ms): 17827.7 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9040/ 159576 | consumed samples: 543152 | elapsed time per iteration (ms): 17458.2 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9050/ 159576 | consumed samples: 545072 | elapsed time per iteration (ms): 17470.7 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9060/ 159576 | consumed samples: 546992 | elapsed time per iteration (ms): 17813.0 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9070/ 159576 | consumed samples: 548912 | elapsed time per iteration (ms): 17646.8 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9080/ 159576 | consumed samples: 550832 | elapsed time per iteration (ms): 17634.4 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9090/ 159576 | consumed samples: 552752 | elapsed time per iteration (ms): 17734.2 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9100/ 159576 | consumed samples: 554672 | elapsed time per iteration (ms): 17470.3 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9110/ 159576 | consumed samples: 556592 | elapsed time per iteration (ms): 17443.8 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9120/ 159576 | consumed samples: 558512 | elapsed time per iteration (ms): 17456.2 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9130/ 159576 | consumed samples: 560432 | elapsed time per iteration (ms): 17374.7 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9140/ 159576 | consumed samples: 562352 | elapsed time per iteration (ms): 17541.4 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9150/ 159576 | consumed samples: 564272 | elapsed time per iteration (ms): 17680.4 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9160/ 159576 | consumed samples: 566192 | elapsed time per iteration (ms): 17412.1 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9170/ 159576 | consumed samples: 568208 | elapsed time per iteration (ms): 18281.1 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9180/ 159576 | consumed samples: 570288 | elapsed time per iteration (ms): 18627.2 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9190/ 159576 | consumed samples: 572368 | elapsed time per iteration (ms): 18546.6 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 21:10:54] PULSE: tr8-104B is scheduled to start in 17:35:30 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition) [2021-09-25 21:10:54] PULSE: tr8-104B is running for 16:43:53 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 9200/ 159576 | consumed samples: 574448 | elapsed time per iteration (ms): 18675.7 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9210/ 159576 | consumed samples: 576528 | elapsed time per iteration (ms): 18679.9 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9220/ 159576 | consumed samples: 578608 | elapsed time per iteration (ms): 18524.7 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9230/ 159576 | consumed samples: 580688 | elapsed time per iteration (ms): 18762.7 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9240/ 159576 | consumed samples: 582768 | elapsed time per iteration (ms): 18695.7 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9250/ 159576 | consumed samples: 584848 | elapsed time per iteration (ms): 18780.0 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9260/ 159576 | consumed samples: 586928 | elapsed time per iteration (ms): 18593.2 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9270/ 159576 | consumed samples: 589008 | elapsed time per iteration (ms): 18476.6 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9280/ 159576 | consumed samples: 591088 | elapsed time per iteration (ms): 18595.2 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9290/ 159576 | consumed samples: 593168 | elapsed time per iteration (ms): 18498.1 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9300/ 159576 | consumed samples: 595248 | elapsed time per iteration (ms): 18531.6 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9310/ 159576 | consumed samples: 597328 | elapsed time per iteration (ms): 18538.6 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9320/ 159576 | consumed samples: 599408 | elapsed time per iteration (ms): 18768.3 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9330/ 159576 | consumed samples: 601488 | elapsed time per iteration (ms): 18445.0 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9340/ 159576 | consumed samples: 603568 | elapsed time per iteration (ms): 18700.8 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9350/ 159576 | consumed samples: 605648 | elapsed time per iteration (ms): 18716.7 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9360/ 159576 | consumed samples: 607728 | elapsed time per iteration (ms): 18488.0 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9370/ 159576 | consumed samples: 609808 | elapsed time per iteration (ms): 18621.0 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9380/ 159576 | consumed samples: 611888 | elapsed time per iteration (ms): 18781.4 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9390/ 159576 | consumed samples: 613968 | elapsed time per iteration (ms): 18582.4 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 22:11:04] PULSE: tr8-104B is scheduled to start in 17:17:05 (at 2021-09-26T15:28:10) (1188168 on 'gpu_p13' partition) [2021-09-25 22:11:04] PULSE: tr8-104B is running for 17:44:03 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 9400/ 159576 | consumed samples: 616192 | elapsed time per iteration (ms): 19918.8 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9410/ 159576 | consumed samples: 618432 | elapsed time per iteration (ms): 19675.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9420/ 159576 | consumed samples: 620672 | elapsed time per iteration (ms): 19904.3 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9430/ 159576 | consumed samples: 622912 | elapsed time per iteration (ms): 19702.9 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9440/ 159576 | consumed samples: 625152 | elapsed time per iteration (ms): 19798.2 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9450/ 159576 | consumed samples: 627392 | elapsed time per iteration (ms): 19797.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9460/ 159576 | consumed samples: 629632 | elapsed time per iteration (ms): 20223.0 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9470/ 159576 | consumed samples: 631872 | elapsed time per iteration (ms): 19847.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9480/ 159576 | consumed samples: 634112 | elapsed time per iteration (ms): 19783.5 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9490/ 159576 | consumed samples: 636352 | elapsed time per iteration (ms): 19768.8 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9500/ 159576 | consumed samples: 638592 | elapsed time per iteration (ms): 19836.7 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9510/ 159576 | consumed samples: 640832 | elapsed time per iteration (ms): 19791.2 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9520/ 159576 | consumed samples: 643072 | elapsed time per iteration (ms): 19677.8 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9530/ 159576 | consumed samples: 645312 | elapsed time per iteration (ms): 19695.3 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9540/ 159576 | consumed samples: 647552 | elapsed time per iteration (ms): 19697.0 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9550/ 159576 | consumed samples: 649792 | elapsed time per iteration (ms): 19776.4 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9560/ 159576 | consumed samples: 652032 | elapsed time per iteration (ms): 19726.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9570/ 159576 | consumed samples: 654272 | elapsed time per iteration (ms): 19764.1 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 23:11:05] PULSE: tr8-104B is scheduled to start in 18:13:44 (at 2021-09-26T17:24:50) (1188168 on 'gpu_p13' partition) [2021-09-25 23:11:05] PULSE: tr8-104B is running for 18:44:04 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 9580/ 159576 | consumed samples: 656512 | elapsed time per iteration (ms): 19889.3 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9590/ 159576 | consumed samples: 658752 | elapsed time per iteration (ms): 19672.3 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9600/ 159576 | consumed samples: 660992 | elapsed time per iteration (ms): 19668.0 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9610/ 159576 | consumed samples: 663360 | elapsed time per iteration (ms): 20660.1 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9620/ 159576 | consumed samples: 665760 | elapsed time per iteration (ms): 20759.5 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9630/ 159576 | consumed samples: 668160 | elapsed time per iteration (ms): 20573.3 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9640/ 159576 | consumed samples: 670560 | elapsed time per iteration (ms): 21117.4 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9650/ 159576 | consumed samples: 672960 | elapsed time per iteration (ms): 21312.3 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9660/ 159576 | consumed samples: 675360 | elapsed time per iteration (ms): 20596.0 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9670/ 159576 | consumed samples: 677760 | elapsed time per iteration (ms): 20413.4 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9680/ 159576 | consumed samples: 680160 | elapsed time per iteration (ms): 20820.1 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9690/ 159576 | consumed samples: 682560 | elapsed time per iteration (ms): 20882.2 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9700/ 159576 | consumed samples: 684960 | elapsed time per iteration (ms): 21320.0 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9710/ 159576 | consumed samples: 687360 | elapsed time per iteration (ms): 20632.6 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9720/ 159576 | consumed samples: 689760 | elapsed time per iteration (ms): 20593.0 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9730/ 159576 | consumed samples: 692160 | elapsed time per iteration (ms): 21160.0 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9740/ 159576 | consumed samples: 694560 | elapsed time per iteration (ms): 20918.8 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-26 00:11:13] PULSE: tr8-104B is scheduled to start in 17:13:36 (at 2021-09-26T17:24:50) (1188168 on 'gpu_p13' partition) [2021-09-26 00:11:13] PULSE: tr8-104B is running for 19:44:12 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 9750/ 159576 | consumed samples: 696960 | elapsed time per iteration (ms): 20828.1 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9760/ 159576 | consumed samples: 699360 | elapsed time per iteration (ms): 20766.8 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 9768 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-26 00:17:36,090] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step9768/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 9768 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 22024.89 [exiting program after 1190.3113538821538 minutes] datetime: 2021-09-26 00:17:52 [2021-09-26 01:11:06] PULSE: tr8-104B is scheduled to start in 18:25:25 (at 2021-09-26T19:36:32) (1188168 on 'gpu_p13' partition) [2021-09-26 02:11:19] PULSE: tr8-104B is scheduled to start in 17:25:12 (at 2021-09-26T19:36:32) (1188168 on 'gpu_p13' partition) [2021-09-26 03:11:35] PULSE: tr8-104B is scheduled to start in 19:51:55 (at 2021-09-26T23:03:31) (1188168 on 'gpu_p13' partition) [2021-09-26 04:11:39] PULSE: tr8-104B is scheduled to start in 19:06:56 (at 2021-09-26T23:18:36) (1188168 on 'gpu_p13' partition) [2021-09-26 05:11:41] PULSE: tr8-104B is scheduled to start in 18:19:12 (at 2021-09-26T23:30:54) (1188168 on 'gpu_p13' partition) [2021-09-26 06:11:46] PULSE: tr8-104B is scheduled to start in 17:19:07 (at 2021-09-26T23:30:54) (1188168 on 'gpu_p13' partition) [2021-09-26 07:11:59] PULSE: tr8-104B is scheduled to start in 17:27:45 (at 2021-09-27T00:39:45) (1188168 on 'gpu_p13' partition) [2021-09-26 08:12:02] PULSE: tr8-104B is scheduled to start in 12:30:49 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition) [2021-09-26 09:12:23] PULSE: tr8-104B is scheduled to start in 11:30:28 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition) [2021-09-26 10:12:24] PULSE: tr8-104B is scheduled to start in 10:30:27 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition) [2021-09-26 11:12:28] PULSE: tr8-104B is scheduled to start in 9:30:23 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition) [2021-09-26 12:12:40] PULSE: tr8-104B is scheduled to start in 10:14:45 (at 2021-09-26T22:27:26) (1188168 on 'gpu_p13' partition) [2021-09-26 13:12:49] PULSE: tr8-104B is scheduled to start in 9:14:36 (at 2021-09-26T22:27:26) (1188168 on 'gpu_p13' partition) [2021-09-26 14:12:56] PULSE: tr8-104B is scheduled to start in 8:33:42 (at 2021-09-26T22:46:39) (1188168 on 'gpu_p13' partition) [2021-09-26 15:13:22] PULSE: tr8-104B is scheduled to start in 7:16:41 (at 2021-09-26T22:30:04) (1188168 on 'gpu_p13' partition) [2021-09-26 16:13:24] PULSE: tr8-104B is scheduled to start in 6:16:39 (at 2021-09-26T22:30:04) (1188168 on 'gpu_p13' partition) [2021-09-26 17:13:32] PULSE: tr8-104B is scheduled to start in 5:16:31 (at 2021-09-26T22:30:04) (1188168 on 'gpu_p13' partition) [2021-09-26 18:13:29] PULSE: tr8-104B is scheduled to start in 9:13:25 (at 2021-09-27T03:26:55) (1188168 on 'gpu_p13' partition) [2021-09-26 19:13:42] PULSE: tr8-104B is scheduled to start in 12:06:13 (at 2021-09-27T07:19:56) (1188168 on 'gpu_p13' partition) [2021-09-26 20:13:45] PULSE: tr8-104B is scheduled to start in 11:06:10 (at 2021-09-27T07:19:56) (1188168 on 'gpu_p13' partition) [2021-09-26 21:14:04] PULSE: tr8-104B is scheduled to start in 18:20:04 (at 2021-09-27T15:34:09) (1188168 on 'gpu_p13' partition) [2021-09-26 22:14:04] PULSE: tr8-104B is scheduled to start in 17:20:04 (at 2021-09-27T15:34:09) (1188168 on 'gpu_p13' partition) [2021-09-26 23:14:12] PULSE: tr8-104B is scheduled to start in 16:36:40 (at 2021-09-27T15:50:53) (1188168 on 'gpu_p13' partition) [2021-09-27 00:14:11] PULSE: tr8-104B is scheduled to start in 15:32:33 (at 2021-09-27T15:46:45) (1188168 on 'gpu_p13' partition) [2021-09-27 01:14:15] PULSE: tr8-104B is scheduled to start in 14:32:29 (at 2021-09-27T15:46:45) (1188168 on 'gpu_p13' partition) [2021-09-27 02:14:18] PULSE: tr8-104B is scheduled to start in 10:17:12 (at 2021-09-27T12:31:31) (1188168 on 'gpu_p13' partition) [2021-09-27 03:14:23] PULSE: tr8-104B is scheduled to start in 9:17:07 (at 2021-09-27T12:31:31) (1188168 on 'gpu_p13' partition) ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- ninja .................. [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ninjastochastic_transformer ................... [OKAY][NO] .......-------------------------------------------------- [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- op name ................ installed .. compatible NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system fused_adam ............. [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system sparse_attn ............ [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- ninjaninja .................................... [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------op name transformer ............ [NO] ....... [OKAY] ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] ................ op nameinstalled .................. installedcompatible ..-------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] compatible -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] cpu_adam...... [OKAY]............... cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ............. [NO]fused_lamb .................... [NO][OKAY] sparse_attn ............ [NO] ....... [OKAY] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformer............ ............[NO] [NO] .............. [OKAY][OKAY] stochastic_transformertransformer ............. [NO][NO] ....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- utils .................. [YES] ...... [OKAY] JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_adam-------------------------------------------------- .............JIT compiled ops requires ninja [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed C++/CUDA extension op report -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. transformer ............ [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system fused_adam ............. [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] async_io ............... [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] transformer_inference .. [NO] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... utils[NO] ......................... [YES][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... async_io[NO] ...................... [NO] ...... [OKAY] [NO] ....... transformer_inference[NO] .. [NO] ....... [OKAY] quantizer .............. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] transformer_inference transformer_inferenceutils.. ....................[NO] [NO][YES]....... .............[OKAY] [OKAY][OKAY] -------------------------------------------------- quantizer utils.............. utils..................[NO] [YES]......................... ...... [OKAY] [YES] [OKAY] ...... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] .......[NO] [NO]....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference ..utils [NO].................. .......[YES] ......[OKAY] [OKAY] async_io ............... [NO] ....... [NO] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninja .................................... [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- op name op name................ installed................ .. installedcompatible ..-------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] [YES] ...... [OKAY] fused_adam ............. [NO]fused_adam ....... .............[OKAY] [NO] fused_lamb....... .............[OKAY] [NO] ....... fused_lamb[OKAY] ............. [NO] ....... [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY] ............ [NO]transformer ................... [NO][OKAY] ....... [OKAY] transformer ............ stochastic_transformer[NO] ........ [OKAY][NO] ninja .................. [OKAY] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 op name ................ installed .. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] cpu_adam ............... [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_adam ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 fused_lamb ............. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 sparse_attn ............ [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] DeepSpeed general environment info: torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] torch version .................... 1.8.1 nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- transformer ............ [NO] ....... [OKAY] JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ...... [OKAY]quantizer async_io ...............async_io [NO]............... .......[NO] [NO]....... .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- utils .................. [YES] ...... [OKAY] JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- DeepSpeed C++/CUDA extension op report torch version .................... 1.8.1 -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 ninja .................. [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ninja .................. [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- op name ................ installed .. compatible deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] transformer_inference .. [NO] async_io....... [OKAY]............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference quantizer.. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninja .................. [OKAY].................. [OKAY]-------------------------------------------------- --------------------------------------------------op name ................ op nameinstalled .................. compatibleinstalled --------------------------------------------------.. compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... [OKAY]............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam ............. [NO] fused_lamb....... .............[OKAY] [NO] ....... fused_lamb[OKAY] ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformer............ ............ [NO][NO] .............. [OKAY][OKAY] transformerstochastic_transformer ............ .[NO] [NO]....... ....... [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version deepspeed install path............... ...........11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version .....................deepspeed info 11.2................... deepspeed install path0.4.2+bc17042, bc17042, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch 1.8, cuda 11.1deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] DeepSpeed general environment info: quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 async_ioasync_io .............................. [NO][NO] .............. [NO][NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] op name ................ installed .. compatible -------------------------------------------------- torch version .................... 1.8.1 cpu_adam ............... [YES] ...... [OKAY] torch cuda version ............... 11.1 fused_adam ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] DeepSpeed general environment info: stochastic_transformer . [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- DeepSpeed C++/CUDA extension op report deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... transformer_inference[OKAY] async_io ............... [NO] ....... [NO] .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] utils .................. [YES] ...... [OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install pathtorch install pathtorch install path ............................................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch versiontorch version ............................................................ 1.8.11.8.11.8.1 /bin/sh: line 0: type: git: not found torch cuda versiontorch cuda versiontorch cuda version ............................................. 11.111.111.1 nvcc versionnvcc versionnvcc version ............................................................... 11.211.211.2 /bin/sh: line 0: type: git: not found deepspeed install pathdeepspeed install path deepspeed install path ........... ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w.deepspeed wheel compiled w....... ............torch 1.8, cuda 11.1 torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. utils[NO] ......................... [YES][OKAY] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_ioasync_io....... [NO].............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferencetransformer_inference .. ..[NO] utils [NO] ....... .................. ....... [OKAY] [YES] [OKAY] ...... [OKAY] utils utils..................quantizer ..................[YES].............. [YES]......[NO] ......[OKAY]....... [OKAY][OKAY] quantizerquantizer --------------------------------------------------............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 sparse_attn ............ [NO] ....... [OKAY] torch cuda version ............... 11.1 transformer ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] stochastic_transformer . [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed general environment info: -------------------------------------------------- JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 JIT compiled ops requires ninja nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed general environment info: op name ................ installed .. compatible torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 cpu_adam ............... [YES] ...... [OKAY] torch cuda version ............... 11.1 fused_adam ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 fused_lamb ............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] sparse_attn ............ [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found ninja .................. [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found op name ................ installed .. compatible -------------------------------------------------- /bin/sh: line 0: type: git: not found cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY]cpu_adam ...............-------------------------------------------------- [YES] ......op name ................[OKAY] installed .. compatible -------------------------------------------------- fused_adam ............. [NO] cpu_adam....... ...............[OKAY] [YES] ...... fused_lamb[OKAY] ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn fused_lamb............ .............[NO] [NO]....... .......[OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer sparse_attn ............. [NO][NO] ....... .......[OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninjafused_adam ............................... [OKAY][NO] --------------------------------------------------....... op name[OKAY] ................ installed fused_lamb.. compatible............. --------------------------------------------------[NO] ....... [OKAY] DeepSpeed general environment info: cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ninja .................. [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op name ................ installed .. compatible fused_adam .............transformer [NO] ................... [NO][OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- ....... [OKAY]fused_lamb cpu_adam ............... [YES] ...... [OKAY] ............. [NO] ....... stochastic_transformer[OKAY] fused_adam ............. [NO] ....... [OKAY] . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] DeepSpeed general environment info: transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninja .................. [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found op name ................ installed .. compatible -------------------------------------------------- /bin/sh: line 0: type: git: not found cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch cuda version ............... 11.1 DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version torch version............... ....................11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science 1.8.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... torch install path0.4.2+bc17042, bc17042, big-science ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info: DeepSpeed general environment info: torch version .................... 1.8.1 torch install path torch cuda version............... ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version .....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.2 deepspeed install pathtorch version ............................... 1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed infotorch cuda version .................................. 0.4.2+bc17042, bc17042, big-science11.1 deepspeed wheel compiled w.nvcc version ........................... torch 1.8, cuda 11.111.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] nvcc version ..................... 11.2 -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja using world size: 512, data-parallel-size: 16, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 /bin/sh: line 0: type: git: not found using torch.float16 for parameters ... /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found ------------------------ arguments ------------------------ /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found accumulate_allreduce_grads_in_fp32 .............. False /bin/sh: line 0: type: git: not found adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 16 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1188168.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True torch version .................... 1.8.1 fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 torch cuda version ............... 11.1 nvcc version ..................... 11.2 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 10 /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None /bin/sh: line 0: type: git: not found num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None /bin/sh: line 0: type: git: not found openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False /bin/sh: line 0: type: git: not found patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] /bin/sh: line 0: type: git: not found rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 /bin/sh: line 0: type: git: not found sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False /bin/sh: line 0: type: git: not found seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer /bin/sh: line 0: type: git: not found train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 /bin/sh: line 0: type: git: not found vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 512 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- /bin/sh: line 0: type: git: not found will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. /bin/sh: line 0: type: git: not found ninja .................. [OKAY] > building GPT2BPETokenizer tokenizer ... -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: /bin/sh: line 0: type: git: not found DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda versiontorch version ................................... 11.11.8.1 /bin/sh: line 0: type: git: not found nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed C++/CUDA extension op report -------------------------------------------------- torch version .................... 1.8.1 NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 ninja .................. [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] ninjaninja .................. ..................[OKAY] [OKAY]-------------------------------------------------- torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] --------------------------------------------------op name nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] ................ op nameinstalled .................. compatibleinstalled --------------------------------------------------.. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ...... ...............[OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [YES] ...... [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] ............. fused_lamb[NO] .................... [NO] [OKAY]....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer ........................ [NO][NO] .............. [OKAY][OKAY] transformer stochastic_transformer............ [NO]. [NO]....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`................  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_iotransformer_inference ................. async_io[NO][NO] ............................. [NO][OKAY][NO] torch version .................... 1.8.1 torch cuda version ............... 11.1 ....... [NO] /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference transformer_inference.. ..[NO]quantizer .......[NO].............. [OKAY][NO]....... .......[OKAY] [OKAY] utils--------------------------------------------------utils deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] /bin/sh: line 0: type: git: not found ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- DeepSpeed general environment info: cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] fused_adam ............. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 fused_lamb ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 sparse_attn ............ [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer ............ [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science stochastic_transformer . [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] DeepSpeed general environment info: torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- DeepSpeed general environment info: torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- 11.1 torch cuda versionnvcc version .................................... 11.111.2 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science quantizer .............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed C++/CUDA extension op report -------------------------------------------------- nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.-------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info: deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ninja .................. [OKAY] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op name ................ installed .. compatible -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... async_io[NO] ...................... [NO][NO] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 utils .................. utils[YES] ........................ [YES][OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] utils....... ..................[OKAY] DeepSpeed general environment info: [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils ..................quantizer [YES].............. ......[NO] [OKAY]....... DeepSpeed general environment info:torch cuda version ............... 11.1 [OKAY] quantizer .............. --------------------------------------------------[NO] nvcc versiontorch install path .................................... 11.2 ....... [OKAY] -------------------------------------------------- deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info torch version................... ....................0.4.2+bc17042, bc17042, big-science 1.8.1deepspeed wheel compiled w. ...... torch cuda versiontorch 1.8, cuda 11.1 ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ..................-------------------------------------------------- [OKAY]JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] DeepSpeed general environment info: stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version .....................torch cuda version 11.2............... 11.1deepspeed install path ...........nvcc version ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... deepspeed infodeepspeed install path .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version ....................11.1 1.8.1nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2 deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch version .................... 1.8.1 op name ................ installed .. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 nvcc version ..................... 11.2 cpu_adam ............... [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. JIT compiled ops requires ninja async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- /bin/sh: line 0: type: git: not found cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. transformer_inference[NO] ....... [OKAY] .. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES]quantizer .................... [OKAY][NO] DeepSpeed general environment info: ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: ninja .................. [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 op name ................ installed .. compatible -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] cpu_adam ............... [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. [NO] ....... [OKAY] DeepSpeed general environment info: /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** sparse_attn ............ [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 transformer ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 stochastic_transformer . [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system /bin/sh: line 0: type: git: not found meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] fused_adam ............. [NO] ....... [OKAY] ...... [OKAY] DeepSpeed general environment info: fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version torch version............... ....................11.1 1.8.1nvcc version /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] .....................torch cuda version 11.2............... deepspeed install path11.1 stochastic_transformer . [NO] ....... [OKAY] ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- ninja .................. [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 op name ................ installed .. compatible -------------------------------------------------- torch cuda version ............... 11.1 cpu_adam ............... [YES] ...... [OKAY] nvcc version ..................... 11.2 fused_adam ............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_lamb ............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info: transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... .....................11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... /bin/sh: line 0: type: git: not found [NO] ....... [NO] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........ [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer async_io.............. ...............[NO] [NO]....... .......[OKAY] [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. nvcc version ..................... 11.2 -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................ ................installed installed.. compatible.. compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES] ..................... [OKAY][YES] ...... [OKAY] fused_adam ............. fused_adam[NO] .................... [OKAY] [NO] ....... fused_lamb[OKAY] ............. [NO] .......fused_lamb [OKAY]............. [NO] ....... [OKAY] sparse_attn ............ sparse_attn[NO] ................... [OKAY][NO] ....... transformer[OKAY] ............ [NO] ....... [OKAY]transformer ............ [NO] stochastic_transformer....... [OKAY]. [NO] ....... stochastic_transformer[OKAY] . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report nvcc version ..................... 11.2 -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 op name ................ installed .. compatible -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] cpu_adam ............... [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_adam ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] async_ioasync_io .............................. [NO] [NO]....... .......[NO] [NO] fused_lamb ............. [NO] ....... [OKAY] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch install path ................... ...............0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninjaninjaninja .................................... ..................[OKAY][OKAY] [OKAY] utils .................. [YES] ...... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found op nameop nameop name ................................................ installedinstalledinstalled ...... compatiblecompatiblecompatible /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found ------------------------------------------------------------------------------------------------------------------------------------------------------ /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found cpu_adam fused_adamfused_adam .......................... [NO]...............[NO] .......[YES]....... [OKAY]......[OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found [OKAY]fused_lamb fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY]fused_adam ............. [NO] ....... [OKAY] sparse_attn sparse_attn............fused_lamb .........................[NO] [NO][NO]....... .......[OKAY] [OKAY] .......transformertransformer [OKAY]............ ............ [NO][NO] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info: transformer ............ [NO] ....... [OKAY] torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 DeepSpeed general environment info: nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] DeepSpeed general environment info: transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] torch version .................... 1.8.1 quantizer .............. quantizer[NO] ..................... [NO][OKAY] torch cuda version ............... 11.1 ....... [OKAY] -------------------------------------------------- nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... [NO]....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... utils[OKAY] .................. [YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch cuda version ............... 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO]transformer_inference ......... [NO][NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... [OKAY]quantizer .............. [NO] ....... [OKAY]utils .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version deepspeed install path............... ...........11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version deepspeed info..................... ...................11.2 0.4.2+bc17042, bc17042, big-sciencedeepspeed install path deepspeed wheel compiled w............ ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infoDeepSpeed general environment info: ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting codecarbon ... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES] ..................... [YES][OKAY] ...... [OKAY] fused_adam ............. fused_adam[NO] .................... [OKAY][NO] ....... [OKAY] fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] ....... [OKAY] sparse_attn ............ sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformer transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- > initializing torch distributed ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda versiontorch version ................................... 11.11.8.1 nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... ..................... 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop nameop name................ ................................installed ................ installed installed ..installed .... compatible compatible.. compatible ---------------------------------------------------------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam .............................. [YES] ............... [YES][YES] ...... [YES]......[OKAY]...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. [NO] .......fused_adamfused_adam fused_adam [OKAY] .......................... ............. [NO] fused_lamb [NO][NO] ....... .................... [OKAY]....... [NO][OKAY] [OKAY] fused_lamb.......fused_lamb [OKAY] fused_lamb ............. ............. ............. [NO] [NO] [NO]....... ..............[OKAY] [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ sparse_attn[NO]sparse_attnsparse_attn ........................................... [OKAY] [NO] .......[NO] stochastic_transformer[NO] [OKAY].............. . [OKAY] [OKAY] transformer[NO] transformer............transformer....... ............[OKAY]............[NO] [NO].......[NO] .......[OKAY]....... [OKAY][OKAY] stochastic_transformer stochastic_transformer. stochastic_transformer[NO] . . ....... [NO] [NO] [OKAY].............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path DeepSpeed general environment info:...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path 1.8.1............... torch version torch cuda version.................... ...............1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 torch cuda version nvcc versiontorch version ............... ......................................... 1.8.111.111.2 nvcc versiondeepspeed install pathtorch cuda version ............................................... 11.2 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathnvcc version deepspeed info........... ........................................ ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.20.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed install path deepspeed wheel compiled w. ................... ........... ...... 0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w. deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop nameop name ................ ................................ ................ installed installedinstalledinstalled .... .. .. compatiblecompatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... cpu_adam...............[YES]............... ............... [YES][YES] ...... [YES]...... ...... [OKAY] ......[OKAY] [OKAY] [OKAY] fused_adamfused_adam .............fused_adam............. fused_adam [NO][NO]............. ............. ....... .......[NO][NO][OKAY] .......[OKAY]....... fused_lamb[OKAY][OKAY] fused_lamb ............. .............[NO] fused_lamb fused_lamb[NO] ....... ............. .................... [OKAY] [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn ............ transformer[NO]sparse_attn sparse_attn............................... [OKAY]............[NO] [NO] [NO] ....... transformer....... ....... [OKAY] ............[OKAY][OKAY] [NO]transformer .......stochastic_transformer............ transformer[OKAY] .[NO]............ [NO].......[NO]stochastic_transformer [OKAY].............. .[OKAY] [OKAY][NO]stochastic_transformer ....... .[OKAY]stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ......................................................ninja [OKAY][OKAY][OKAY].................. [OKAY]------------------------------------------------------------------------------------------------------------------------------------------------------ op name--------------------------------------------------op nameop name ................................op name................ installed................installedinstalled ..installed.... compatible..compatiblecompatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adamcpu_adam ...............[YES].............................. [YES]......[YES][YES] [OKAY].................. [OKAY][OKAY][OKAY] fused_adam ............. fused_adamfused_adam[NO] fused_adam............. ............. ....... ............. [NO][NO] [OKAY] .......[NO]....... [OKAY].......[OKAY]fused_lamb [OKAY]............. fused_lamb[NO]fused_lamb .............fused_lamb.................... [NO].............[OKAY][NO] ....... ....... [NO] [OKAY] [OKAY] ....... [OKAY] sparse_attn ............ [NO]sparse_attn .......sparse_attn............ [OKAY]............sparse_attn[NO] [NO]................... .......transformer[NO][OKAY] [OKAY] ................... [NO][OKAY] transformer.......transformer ............[OKAY]............ [NO][NO]transformer .......................... stochastic_transformer [OKAY] [OKAY][NO] ........ [NO][OKAY] stochastic_transformer stochastic_transformer....... [OKAY] ..stochastic_transformer [NO][NO] ............... [OKAY][NO] [OKAY]....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... ....................................[OKAY][OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op nameop name................................ ................installedinstalled................ ..installed.. installed compatiblecompatible .. .. -------------------------------------------------- --------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... cpu_adam [YES] ............... [YES]..................... ......[YES][OKAY][YES] [OKAY]............ [OKAY][OKAY] fused_adam .............fused_adam [NO]............. fused_adam.......[NO]fused_adam [OKAY]................................. [OKAY][NO]fused_lamb [NO] ....... ............. fused_lamb .......[OKAY][NO] .............[OKAY]....... [NO][OKAY]fused_lamb .......fused_lamb............. [OKAY].............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............ [NO] .......transformersparse_attn sparse_attn ............[OKAY] ........................[NO]transformer ............[NO].......[NO] [NO] [OKAY] ..................... [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformer.transformertransformer [NO]........................ . ....... [NO][NO][NO] [OKAY]..................... [OKAY][OKAY] [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name op name................ ................................installed................ installedinstalled.. installed .... compatible .. compatiblecompatible-------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam[YES]cpu_adam ............... .................................... [YES][OKAY][YES][YES] ...... ...... ...... [OKAY][OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adamfused_adam fused_lamb....................................... .............[NO][NO] [NO] [NO]..................... [OKAY].......[OKAY] [OKAY] [OKAY]fused_lamb fused_lamb fused_lamb ............. ............. ............. [NO] [NO] [NO] ....... ....... ....... [OKAY]sparse_attn[OKAY] [OKAY]............ [NO] ....... [OKAY] transformer ............ [NO]sparse_attn sparse_attn sparse_attn....... ............ ............ ............[OKAY][NO][NO] [NO].............. stochastic_transformer.......[OKAY][OKAY] [OKAY]transformer. transformer ............[NO]............transformer [NO] .......[NO] ............ [OKAY].............. [NO][OKAY][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer. . [NO][NO]. .............. [OKAY][NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op name op name ................ op name ................................installed installed..................installed ..compatible ..compatibleinstalled compatible----------------------------------------------------------------------------------------------------.. compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam [YES]cpu_adam[YES] ...... ............... ..................... [OKAY][OKAY][YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. [NO]............. .......[NO] fused_adam [OKAY]fused_adam ....... ............. fused_lamb[OKAY]............. [NO] .............[NO]....... fused_lamb .......[NO] [OKAY] .................... [OKAY] [OKAY][NO]fused_lamb fused_lamb.................... [OKAY].............[NO] [NO]....... sparse_attn.......[OKAY] ............[OKAY] [NO] sparse_attn....... ............[OKAY] [NO] ....... [OKAY]transformer sparse_attn ............ sparse_attn ............ transformer[NO] [NO]............ ............ ....... [NO] .......[OKAY] [NO]....... [OKAY] ....... stochastic_transformer [OKAY] [OKAY] transformer . ............[NO]transformer stochastic_transformer.......[NO]............ [OKAY] ........ [NO] [NO][OKAY] .............. [OKAY][OKAY] stochastic_transformer . [NO] stochastic_transformer....... [OKAY]. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op nameop name................ installed................................................ installed.. installedinstalled.. compatiblecompatible .... -------------------------------------------------- -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... cpu_adam [YES] [YES] ............... ..................... ...... [YES][OKAY][OKAY][YES] ...... ...... [OKAY][OKAY] fused_adam fused_adam............. .............[NO]fused_adam fused_adam [NO] .................... ............. ....... [OKAY][NO] [NO] .......[OKAY].......fused_lamb .............[OKAY][OKAY] [NO]fused_lamb .................... fused_lamb [OKAY]fused_lamb [NO] ............. ............. ....... [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ sparse_attn[NO]sparse_attn sparse_attn ............ ............................... [NO][NO][NO][OKAY] ....... ....... .......[OKAY] [OKAY] [OKAY] stochastic_transformer transformertransformertransformer . ............ ............[NO]............ .......[NO][NO][NO] [OKAY]..................... [OKAY][OKAY][OKAY] stochastic_transformerstochastic_transformer stochastic_transformer . .[NO] . [NO] ....... [NO].......[OKAY] .......[OKAY] [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninja ..................ninja .................. .................. [OKAY] [OKAY][OKAY].................. ----------------------------------------------------------------------------------------------------[OKAY] -------------------------------------------------- --------------------------------------------------op nameop name op name op name................................ ................ installed ................ ..installed installedinstalled ..compatible.. .. compatible compatible-------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... [YES]..............................[YES] [YES]......[YES]...... [OKAY] ...... ......[OKAY] [OKAY] [OKAY] fused_adam ............. [NO] fused_adam.......fused_adam fused_adam .............[OKAY]............. .............[NO] [NO] [NO] fused_lamb ....... .............. ............. [OKAY][NO][OKAY] [OKAY] ....... [OKAY]fused_lambfused_lamb fused_lamb ....................................... [NO][NO][NO] ..............sparse_attn ....... [OKAY][OKAY]............ [OKAY] [NO] ninjaninjaninja ninja ...................................................... .................. [OKAY] [OKAY][OKAY][OKAY] ....... [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- transformer sparse_attn............sparse_attn sparse_attn ............ [NO]........................ [NO]....... [NO] .......[NO].......[OKAY] op nameop name op name................op name................ ................installed................installed installed..installed.. ..compatiblecompatible.. [OKAY] .......[OKAY]stochastic_transformer [OKAY] transformer --------------------------------------------------compatible --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- .transformer............ [NO]transformer............ ............ ....... [NO][NO][NO] .......[OKAY] .............. cpu_adam ...............cpu_adam [YES]............... cpu_adam......cpu_adam [YES] ............... ...............[OKAY] ...... [YES][OKAY][YES] [OKAY] [OKAY][OKAY] ............ [OKAY][OKAY] stochastic_transformer stochastic_transformerstochastic_transformer. .[NO]. [NO].......[NO] ..............[OKAY] [OKAY][OKAY] fused_adam ............. fused_adam[NO] .................... [NO]fused_adam[OKAY] fused_adam.................... fused_lamb[OKAY] ............. [NO] .............fused_lamb [NO] ....... [NO].................... .......[NO][OKAY] [OKAY] [OKAY] ....... [OKAY]fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]....... [OKAY] transformersparse_attn ............transformersparse_attn ............[NO]............ [NO]...................[NO] [OKAY].......[NO] ....... ....... [OKAY] [OKAY] stochastic_transformer[OKAY] transformerstochastic_transformer . ............transformer . [NO][NO]............[NO] ..............[NO] ....... [OKAY] [OKAY] [OKAY]....... stochastic_transformer[OKAY] . [NO]stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op nameop name................ op name................................ installedinstalled................installed .. installed....compatible compatible..--------------------------------------------------compatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam......cpu_adam ...............[OKAY] ............... [YES]...............[YES] ......[YES]...... [OKAY][OKAY]...... fused_adam [OKAY]............. [NO] ....... [OKAY] fused_adamfused_adamfused_lamb .............fused_adam............. ............. [NO][NO] ............. [NO].............. [OKAY][NO].......[OKAY] .......[OKAY] [OKAY]fused_lamb .............fused_lamb fused_lamb[NO]............. sparse_attn[NO].................... [OKAY] ................... [NO] [NO].......[OKAY] .......[OKAY] [OKAY] transformersparse_attn ........................ [NO][NO] sparse_attn ....... ....... ............ [OKAY]sparse_attn [OKAY] [NO] ............ stochastic_transformer .......[NO]transformer. ...................[OKAY][NO] [OKAY] [NO] .............. transformer[OKAY] [OKAY]transformer ............ ............[NO] [NO]....... stochastic_transformer.......[OKAY] [OKAY]. [NO]stochastic_transformer .......stochastic_transformer .[OKAY] .[NO] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled ........ compatible compatiblecompatible compatible ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam...............cpu_adamcpu_adam ..............................[YES]............... [YES]......[YES][YES] ......[OKAY]............ [OKAY][OKAY] [OKAY] fused_adam .............fused_adamfused_adam [NO].............fused_adam............. ....... [NO]............. [NO] [OKAY] .......[NO] ....... [OKAY].......[OKAY] fused_lamb[OKAY] .............fused_lamb fused_lamb [NO] .......................... fused_lamb....... [NO] [NO] .............[OKAY]....... [NO].......[OKAY] .......[OKAY] [OKAY] sparse_attn ............ [NO]sparse_attn sparse_attn ....... ............ ............sparse_attn[OKAY] [NO][NO]............ ..............transformer[NO] [OKAY] [OKAY]............ ....... [NO][OKAY] transformer ....... transformer ............ [OKAY]transformer ............ [NO]............ .......[NO][NO] [OKAY]..............stochastic_transformer [OKAY][OKAY] .stochastic_transformer [NO] stochastic_transformer....... . stochastic_transformer[OKAY].[NO] [NO]........ .......[OKAY][NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja .................. .................................... ..................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name -------------------------------------------------- ................ op nameop name op name ................installed ................ ..................installedinstalled compatible..installed.. --------------------------------------------------compatible.. compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES]...............cpu_adam cpu_adam ......[YES] ............... ...............[OKAY] ...... [YES] [YES][OKAY] ............ [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... .............[OKAY] fused_adam[NO]fused_adam fused_lamb................................. [OKAY] .............[NO][NO] [NO] fused_lamb....... ....... ....................[OKAY] [NO][OKAY][OKAY] fused_lamb....... [OKAY]fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] .......transformer sparse_attn[OKAY] sparse_attn ............ ............ transformer............ [NO] [NO] ............[NO] ....... .......[NO][OKAY]....... .......[OKAY][OKAY] stochastic_transformer[OKAY] transformer. transformerstochastic_transformer............ [NO] ............ [NO]........ [NO] [NO]....... [OKAY] ..............[OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... [OKAY] .................. [YES] ......utils [OKAY] .................. [YES] ...... [OKAY] quantizer ..............quantizer [NO].............. [NO] ....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninjaninjaninjaninja .................. .................. ..................[OKAY].................. [OKAY] quantizer .............. [NO] ....... [OKAY] [OKAY][OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name -------------------------------------------------- -------------------------------------------------- op name ................op name op name installed................ ................installed ................ installed.... installed ..compatiblecompatible.. compatible--------------------------------------------------compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ...............cpu_adamcpu_adam............... [YES]..............................[YES] [YES]...... ......[YES] ...... [OKAY] ......[OKAY] [OKAY] [OKAY] fused_adam .............fused_adam fused_adamfused_adam [NO] ....................................... ....... [NO][NO] [NO][OKAY] .............. .......[OKAY] fused_lamb [OKAY][OKAY]............. [NO]fused_lamb .......fused_lambfused_lamb............. [NO][OKAY] ............. .................... [NO][NO][OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer ........................ sparse_attn [NO]sparse_attn [NO] ....... ............................... [OKAY][NO][OKAY][NO] ..............transformer stochastic_transformer ............[OKAY] [OKAY] . [NO] [NO]transformer.......transformer ...............................[OKAY] [NO][OKAY][NO] stochastic_transformer ....... ....... [OKAY]. [OKAY] [NO] stochastic_transformer....... stochastic_transformer[OKAY]. .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... utils[OKAY] .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name ................ ................op name................ installedinstalled................installed ....installed.. compatible compatible.. compatible ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adamcpu_adam ..............................cpu_adam cpu_adam[YES][YES] .................................... ...... [YES][YES][OKAY] ......[OKAY]...... [OKAY][OKAY] async_io ............... [NO] ....... [NO] fused_adam fused_adam............. fused_adam.............[NO] .............fused_adam[NO]....... [NO] .................... [OKAY]....... [OKAY][NO][OKAY] transformer_inference .. [NO] ....... [OKAY] .......fused_lamb [OKAY]fused_lambfused_lamb............. utils .................. [YES] ...... [OKAY] ..........................[NO] fused_lamb[NO] [NO] .................... ....... .......[NO][OKAY] [OKAY] ....... quantizer .............. [NO] ....... [OKAY] [OKAY] [OKAY] -------------------------------------------------- sparse_attnsparse_attnsparse_attn sparse_attn............ ........................ [NO] ............ [NO] .......[NO] [NO] .......[OKAY] ....... ....... [OKAY] [OKAY]transformer[OKAY] ............transformer transformertransformer [NO] ........................................... [NO] [NO][NO][OKAY] ..................... [OKAY][OKAY][OKAY] stochastic_transformer .stochastic_transformer stochastic_transformerstochastic_transformer [NO] .......... [OKAY][NO][NO][NO] ....... ....... ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY][OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name................................op name ................installedinstalled ................installed .... installedcompatiblecompatible.. -------------------------------------------------- ..--------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]...............cpu_adam cpu_adam ......[YES] ............... [OKAY] .....................[YES] [YES][OKAY]...... ......[OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] fused_adam ............. fused_adam.............fused_lamb[NO] [NO]................................. .......[NO][NO][OKAY] [OKAY].............. [OKAY]fused_lamb[OKAY] fused_lamb ............. fused_lamb[NO]............. ....................[NO] [NO][OKAY]....... [OKAY].......sparse_attn ............[OKAY] [NO] ....... [OKAY] sparse_attn transformer............ ............sparse_attn[NO] ............[NO]....... sparse_attn .......[NO] [OKAY] ............ [OKAY]....... transformer[NO][OKAY] stochastic_transformer................... transformer [NO].[OKAY] ....... [NO]............[OKAY] .......transformer[NO] [OKAY]............ .......stochastic_transformer [NO][OKAY] ........ [NO][OKAY]stochastic_transformer ....... [OKAY] .stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop nameop name ................ ................ ................................ installedinstalled installed installed...... compatible..compatiblecompatible compatible------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam ............... .............................. ............... [YES][YES][YES][YES] ........................ [OKAY][OKAY] [OKAY] [OKAY] fused_adamfused_adam fused_adamfused_adam ............. .......................... ............. [NO] [NO] [NO][NO] ....... ....... ....... [OKAY] [OKAY].......[OKAY] [OKAY] fused_lambfused_lambfused_lamb fused_lamb .......................... ............. ............. [NO][NO] [NO] [NO] ....... .............. ....... [OKAY] [OKAY][OKAY][OKAY] sparse_attnsparse_attnsparse_attnsparse_attn .................................... ............ [NO][NO][NO][NO] ....... ..................... [OKAY] [OKAY][OKAY] [OKAY] transformer transformertransformertransformer............ ....................................[NO] [NO][NO][NO]....... ..............[OKAY] .......[OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformerstochastic_transformer . .[NO] ..[NO]....... [NO].......[NO] [OKAY] [OKAY] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................. [OKAY]..................[OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name op name................op name................ ................................ installedinstalledinstalled installed .. .... .. compatiblecompatible compatible compatible ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam...............cpu_adam ...............[YES]...............[YES] ......[YES] [YES]...... [OKAY] ...... ...... [OKAY][OKAY] [OKAY] fused_adam ............. [NO]fused_adam fused_adam....... ............. ............. fused_adam[OKAY][NO] .......[NO]............. [OKAY]fused_lamb.......[NO] ............. [OKAY]fused_lamb ....... [NO] .............[OKAY]....... [NO]fused_lamb fused_lamb[OKAY] ....... .......................... [NO][OKAY][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ transformer[NO] ................... [NO]sparse_attnsparse_attn [OKAY] ............................... transformer[OKAY][NO] [NO]................... .......stochastic_transformer[OKAY][NO] [OKAY]....... . transformer[OKAY][NO] transformer ............ .......stochastic_transformer[NO]............ [NO][OKAY]........ .......[OKAY][NO] [OKAY]....... [OKAY]stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 DeepSpeed general environment info: ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. [OKAY] .................. [OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name................op name ................................installed................ ..installed installed installed .. compatible.. ..compatible -------------------------------------------------- compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam ...............cpu_adam[YES] cpu_adam ...............[YES] ...........................[YES] [YES][OKAY][OKAY] ...... ...... [OKAY] [OKAY] fused_adamfused_adam .............fused_adam............. [NO] .............fused_adam [NO] .......[NO] ....................[OKAY] ....... [OKAY][NO] [OKAY]....... fused_lamb fused_lamb [OKAY] ............. .............fused_lamb[NO] fused_lamb[NO] .................... ............. ....... [OKAY][NO][NO] [OKAY] ....... .......[OKAY] [OKAY] sparse_attn ............sparse_attn sparse_attn [NO] ............sparse_attn ............................... [NO][OKAY][NO][NO] .....................transformer [OKAY][OKAY][OKAY] ............ [NO]transformer transformer transformer....... ............ ............ ............[OKAY] [NO][NO] [NO] ....... ....... ....... stochastic_transformer [OKAY] [OKAY][OKAY] . [NO] stochastic_transformer.......stochastic_transformerstochastic_transformer [OKAY]. .. [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op nameop name op name ................................op name................ installed................installedinstalled ....installed .. compatible ..compatible compatible-------------------------------------------------- ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ............... ...............cpu_adam [YES] [YES] ...............[YES]............ [YES]......[OKAY][OKAY] [OKAY] ...... [OKAY] fused_adam fused_adam.............fused_adam ............. ............. fused_adam[NO][NO][NO] .................................. [OKAY] [NO] [OKAY][OKAY] .......fused_lamb fused_lamb [OKAY]fused_lamb............. ............. .............[NO] fused_lamb [NO][NO] ....... ............. ....... .......[OKAY] [NO][OKAY][OKAY] ....... [OKAY] sparse_attnsparse_attn sparse_attn ............ ............ sparse_attn............ [NO] [NO] [NO]............ ....... ....... ....... [OKAY][NO][OKAY][OKAY] .......transformer transformertransformer[OKAY]............ ............ ............[NO][NO]transformer ....... .......[NO]............ .......[OKAY][NO][OKAY] [OKAY]....... stochastic_transformer [OKAY]stochastic_transformer . stochastic_transformer [NO] . stochastic_transformer ........ [NO] [OKAY][NO]........ .......[OKAY][NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ninja...................................................... [OKAY] [OKAY][OKAY] .................. ----------------------------------------------------------------------------------------------------[OKAY]-------------------------------------------------- op name op name--------------------------------------------------op name................ ................ installedop name................installed ....................installed compatible compatibleinstalled..-------------------------------------------------- --------------------------------------------------..compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]cpu_adam...... ......cpu_adam............... [OKAY][OKAY] [YES]............... [YES]...... ......[OKAY] [OKAY]fused_adam fused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_adam fused_adam............. fused_lambfused_lamb .......................... [NO] .............[NO] [NO].......[NO]....... [OKAY]..............[OKAY] [OKAY][OKAY] ninjaninjaninja ninja .................................... .................. ..................[OKAY][OKAY][OKAY] [OKAY] fused_lamb .............fused_lamb [NO]............. .......[NO] .......sparse_attn[OKAY] sparse_attn [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ............ ............[NO] [NO]....... .......[OKAY] [OKAY] op nameop name op nameop name ................ ................................................ installedinstalledinstalledinstalled .. .... ..compatiblecompatiblecompatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- transformertransformer sparse_attn ........................sparse_attn............ [NO] [NO]............[NO] [NO]..................... .......[OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- transformerstochastic_transformer stochastic_transformer............transformer . [NO]. ............ [NO][NO]....... [NO] .............. [OKAY] .......[OKAY] cpu_adamcpu_adam ..............................cpu_adam cpu_adam [YES] ...............[YES] ...............[YES]............ ...... [YES][OKAY] [OKAY][OKAY] [OKAY][OKAY]...... [OKAY] stochastic_transformer stochastic_transformer. [NO] . .......[NO] .......[OKAY] fused_adam ............. [NO] .......fused_adam fused_adam[OKAY]fused_adam [OKAY] .......................... ............. [NO][NO] fused_lamb [NO]........................... [OKAY]....... [OKAY] [NO] [OKAY]....... [OKAY]fused_lambfused_lamb fused_lamb ....................................... [NO][NO][NO] ....... ....... ....... sparse_attn[OKAY] [OKAY] [OKAY] ............ [NO] ....... [OKAY] transformer ............ [NO]sparse_attn sparse_attnsparse_attn ....... ............ ........................[OKAY] [NO][NO][NO] .......stochastic_transformer .............. [OKAY] [OKAY].[OKAY] transformer[NO] ............transformer transformer....... [NO] ............ ...................[NO][OKAY] [NO] [OKAY]....... .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer .stochastic_transformer . [NO] [NO]........ [OKAY].......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info:deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1 ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op name op name................................................ ................installed installedinstalled.. installed....compatible .. compatible compatible-------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam cpu_adam[YES].............................. ...... ...............[YES] [YES] [OKAY] [YES]............ ......[OKAY][OKAY] [OKAY] fused_adam ............. [NO]fused_adamfused_adam fused_adam............. ....... ............. .............[OKAY] [NO] [NO] [NO] ....... ....... ....... [OKAY] fused_lamb[OKAY][OKAY] .............fused_lamb fused_lamb [NO]fused_lamb ................................. ............. [NO][OKAY] [NO][NO] ..................... [OKAY] [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn sparse_attn sparse_attn transformer............ ............ ............[NO] ............[NO] [NO] ....... .......[NO] [OKAY]....... .......[OKAY][OKAY] transformer [OKAY]............ transformer [NO]............ stochastic_transformer transformer....... [NO] . ............ [OKAY].......[NO] .......[NO] [OKAY] [OKAY] .......stochastic_transformer stochastic_transformer [OKAY] .. [NO]stochastic_transformer[NO] ............... [OKAY][OKAY][NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] .......utils [NO].................. [YES] ...... [OKAY] quantizer .............. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................................... .................. .................. [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................................................ installedinstalled installed .. installed.... .. compatiblecompatiblecompatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adamcpu_adam ............... .............................. ............... [YES][YES] [YES][YES] ........................ [OKAY] [OKAY] [OKAY][OKAY] fused_adamfused_adam fused_adamfused_adam ............. ..........................[NO] ............. [NO].......[NO] .......[OKAY].......[NO] [OKAY] .......[OKAY] fused_lamb [OKAY] fused_lamb ............. .............fused_lamb[NO] [NO]fused_lamb.................... ....... .............[OKAY][NO] [OKAY][NO]....... .......[OKAY] [OKAY] sparse_attn sparse_attn............ ............[NO]sparse_attnsparse_attn ............[NO]................... .......[OKAY][NO][NO] [OKAY].............. transformer[OKAY][OKAY] transformer ............transformer............ transformer [NO] ............[NO] ............ ....... [NO]....... [NO] [OKAY] [OKAY] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer stochastic_transformer.stochastic_transformer. .[NO][NO] ........[NO]....... [NO][OKAY].......[OKAY] ....... [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]async_io async_io............... -------------------------------------------------- ............... [NO] [NO]....... .......[NO] [NO] transformer_inference ..transformer_inference [NO] ......... [NO][OKAY] ....... [OKAY] DeepSpeed general environment info: utils .................. utils[YES] ........................ [OKAY][YES] ...... [OKAY] DeepSpeed general environment info:torch install path ............... quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2 deepspeed infodeepspeed install path .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninja .................................... [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- op name op name................ installed................ ..installed compatible ..-------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... [OKAY][YES] ...... [OKAY] fused_adam ............. fused_adam[NO] .................... [OKAY] [NO] ....... [OKAY]fused_lamb ............. [NO] fused_lamb....... [OKAY]............. [NO] ....... [OKAY] sparse_attn ............ [NO]sparse_attn ....... ............[OKAY] [NO] .......transformer [OKAY]............ [NO] transformer....... ............[OKAY] [NO] ....... stochastic_transformer[OKAY] . [NO] .......stochastic_transformer [OKAY] . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop nameop name ................ ................................ ................installed installed installedinstalled.... ..compatible .. compatible compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES] cpu_adam ............... ............... .....................[YES] [YES][OKAY][YES]...... [OKAY]............ [OKAY][OKAY] fused_adam fused_adam.............fused_adam fused_adam............. [NO] ............. [NO]............. .......[NO] ....... [NO][OKAY]....... .......[OKAY][OKAY] [OKAY]fused_lamb .............fused_lambfused_lamb fused_lamb[NO].......................... ....................[NO][NO] [OKAY][NO]....... ....... [OKAY] ....... /bin/sh: line 0: type: git: not found [OKAY] [OKAY] sparse_attn ............sparse_attn [NO] sparse_attnsparse_attn................... [NO]............ [OKAY] ............ .......[NO] [NO][OKAY].......transformer ...................[OKAY] transformer [OKAY][NO] ...................transformertransformer [OKAY] [NO] ............ ............ ....... [NO] [NO] [OKAY] ....... ....... stochastic_transformer [OKAY] [OKAY]stochastic_transformer . [NO]stochastic_transformer. stochastic_transformer.......[NO] .[OKAY]....... . [NO][OKAY][NO] .............. [OKAY][OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... ..................[OKAY] .................. [OKAY][OKAY] --------------------------------------------------[OKAY]-------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------................ op nameop name op name ................installed................ installed................installed.. .. ..installed compatiblecompatible compatible..---------------------------------------------------------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adam cpu_adam[YES][YES]............... ...........................[YES] [YES][OKAY]...... [OKAY] ......[OKAY] [OKAY] fused_adamfused_adam ..........................fused_adam fused_adam[NO].............[NO] .................... [NO] ....... [NO][OKAY] ....... [OKAY].......[OKAY] [OKAY]fused_lamb .............fused_lamb fused_lamb fused_lamb[NO] .......................... ....................[NO][NO] [OKAY].............. [NO] [OKAY] [OKAY] ....... [OKAY] sparse_attn ............ [NO]sparse_attn .......sparse_attn ............ sparse_attn[OKAY] ............ ............[NO] [NO][NO] transformer.............. .......[OKAY][OKAY]............ [OKAY]transformer[NO] transformer................... transformer............[NO][OKAY] .......[NO]............ stochastic_transformer[NO][OKAY] ....... [OKAY]........ stochastic_transformer [OKAY] [NO] stochastic_transformer....... . stochastic_transformer .[OKAY] [NO] [NO]........ .......[NO][OKAY] [OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... transformer_inference[NO] .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. [NO]utils ......................... [OKAY][YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name................ op name................................installed ................installed..installed installed..compatible.. ..compatiblecompatible-------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adamcpu_adamcpu_adam[YES] ................................................... [YES][YES][YES][OKAY] .................. [OKAY][OKAY][OKAY] fused_adam ............. [NO]fused_adam fused_adamfused_adam ....... .......................... ............. [OKAY][NO][NO] [NO] ..............fused_lamb....... [OKAY][OKAY].............[OKAY] [NO] ....... fused_lambfused_lambfused_lamb [OKAY] ............. .......................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]sparse_attnsparse_attn............ ........................[NO] [NO]transformer [NO] ....... ................... ....... [OKAY] [OKAY] [OKAY][NO] transformer.......transformer transformer[OKAY]........................ ............ [NO] [NO] stochastic_transformer[NO] ....... ....... ....... [OKAY] .[OKAY] [OKAY] [NO] ....... stochastic_transformerstochastic_transformer[OKAY] stochastic_transformer .. . [NO] [NO] [NO] ....... .............. [OKAY][OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] .......async_io [NO] ............... [NO] ....... [NO]transformer_inference .. [NO] ....... [OKAY] transformer_inference .. transformer_inferenceutils[NO] ........................... [NO][YES] [OKAY]............. [OKAY][OKAY] utils ..................quantizer [YES]utils.............. ........................[NO] [OKAY][YES]....... ......[OKAY] [OKAY]quantizer .............. [NO]-------------------------------------------------- .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system /bin/sh: line 0: type: git: not found meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info0.4.2+bc17042, bc17042, big-science ................... deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op nameop name ................ ................................ ................ installedinstalledinstalledinstalled ........ compatible compatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam..............................cpu_adam [YES]...............[YES]............... ......[YES]...... [YES] [OKAY] [OKAY]............ [OKAY][OKAY] fused_adamfused_adam .......................... fused_adamfused_adam[NO] [NO] .................... ............. ....... [NO][OKAY] [NO] .......[OKAY]....... [OKAY][OKAY]fused_lamb fused_lamb............. .............fused_lamb[NO] fused_lamb....... [NO] ............. .............[OKAY]....... [NO][NO][OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO]sparse_attn ................... sparse_attnsparse_attn[OKAY] [NO]........................ transformer.......[NO][NO] ............ [OKAY]....... .......[NO]transformer[OKAY] .......[OKAY]............ [OKAY]transformer [NO] .......transformer............ stochastic_transformer [NO] [OKAY] ............ ........ [NO][NO][OKAY] ..............stochastic_transformer [OKAY][OKAY]stochastic_transformer. [NO] ........stochastic_transformer [NO] [OKAY] ........ [NO][OKAY] ninjaninjaninjaninja .................. .................. .................................... [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op nameop name................op name installed................................................ installedinstalled..installed .. ....compatible compatiblecompatible--------------------------------------------------compatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam cpu_adam...............cpu_adam cpu_adam...............[YES] ..............................[YES] ......[YES]......[YES] ......[OKAY]...... [OKAY] [OKAY] [OKAY] fused_adam fused_adam............. fused_adam.............fused_adam .............[NO] [NO]............. ....... [NO] ....... .......[OKAY][NO] [OKAY][OKAY]....... fused_lamb[OKAY] fused_lamb.............fused_lamb .............fused_lamb.............[NO] [NO][NO].................... ..............[NO] [OKAY] [OKAY] ....... [OKAY] [OKAY] sparse_attn ............ [NO]sparse_attn .......sparse_attnsparse_attn ............ ............[OKAY]............ [NO][NO] [NO] ..............transformer....... [OKAY]............[OKAY][OKAY] [NO] transformertransformer ....... ............transformer............ [OKAY] [NO]............ [NO]....... [NO][OKAY].......stochastic_transformer ....... .[OKAY][OKAY] [NO]stochastic_transformer ....... .[OKAY]stochastic_transformer stochastic_transformer [NO] ......... [OKAY][NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info:DeepSpeed general environment info: deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op name................op name................ installed................installed ................ ..installed.. compatiblecompatibleinstalled.. --------------------------------------------------..--------------------------------------------------compatible deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 compatible-------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: cpu_adam cpu_adam............... ...............[YES] cpu_adam cpu_adam[YES]...... .............................. ...... [OKAY][YES][OKAY][YES] ............ [OKAY][OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] fused_adamfused_adam .......................... fused_adam[NO]fused_adam [NO]....... [OKAY] ............. torch version .................... 1.8.1 .................... fused_lamb[OKAY] [NO][NO] torch cuda version ............... 11.1 ............. ....... .......fused_lamb [NO] [OKAY][OKAY].................... [OKAY][NO]fused_lamb nvcc version ..................... 11.2 fused_lamb.................... .............[NO][OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] [NO]....... .......[OKAY] [OKAY]sparse_attn ............ [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science sparse_attn transformer............ sparse_attn............[NO]sparse_attn ............[NO]....... ............[NO] [OKAY]....... [NO] ....... [OKAY] transformer[OKAY]....... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............[OKAY]stochastic_transformertransformer [NO]. ...................transformer ............[NO] [NO][OKAY] [NO] ....... ..............[OKAY] stochastic_transformer[OKAY][OKAY] . [NO]stochastic_transformer stochastic_transformer....... .[OKAY]. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. [OKAY] quantizer .............. utils[NO] ......................... [YES][OKAY] --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja ...... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name................op nameop name ................installed................................ installed ..installed installed ..compatible.... compatible--------------------------------------------------compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES]cpu_adam cpu_adamcpu_adam...... ...............[OKAY]............... ............... [YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adam fused_adam .......................... [NO]fused_lamb.............[NO] ....... .............[NO].......[OKAY] [NO][OKAY] .......fused_lamb....... [OKAY].............[OKAY]fused_lamb [NO]............. [NO]fused_lamb....... ....................[OKAY] [OKAY][NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attn transformer........................ ............ sparse_attn[NO] [NO] [NO] ................... ....... ....... [NO][OKAY][OKAY] .......[OKAY] transformertransformer[OKAY] ........................stochastic_transformer transformer [NO][NO]............. ..............[NO][NO] [OKAY].......[OKAY]....... [OKAY][OKAY] stochastic_transformer stochastic_transformer .. stochastic_transformer [NO] [NO] .............. . [OKAY] [OKAY] [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... [OKAY]quantizer .............. [NO] ....... [OKAY]utils .................. [YES]-------------------------------------------------- ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ...................................................... [OKAY][OKAY][OKAY] .................. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- [OKAY]op name op name op name ................ ................--------------------------------------------------................ installed installedinstalled .. .. ..op name compatiblecompatible compatible ----------------------------------------------------------------------------------------------------................ --------------------------------------------------installed .. compatible cpu_adam-------------------------------------------------- cpu_adam ...............cpu_adam ...............[YES]............... [YES]......[YES] ......[OKAY]...... [OKAY]cpu_adam[OKAY] ............... [YES] ...... [OKAY]fused_adam ............. fused_adam[NO] fused_adam ............. ....... ............. [NO] [OKAY] [NO] ....... .......fused_adam[OKAY]fused_lamb [OKAY].......................... fused_lamb [NO]fused_lamb[NO] ............. ....... ....................[NO][OKAY] [NO] [OKAY].............. [OKAY][OKAY] fused_lamb ............. [NO] ....... sparse_attn[OKAY] ............ [NO] ....... sparse_attnsparse_attn[OKAY] ........................ [NO]transformer[NO] .......................... sparse_attn [OKAY][OKAY] [NO] transformer................... ............transformer[OKAY][NO] [NO]................... stochastic_transformer .......[NO][OKAY] [OKAY]........ [OKAY][NO]transformer stochastic_transformer................... stochastic_transformer [OKAY].[NO] . [NO] [NO].............. ....... [OKAY][OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................op name................op name installed ................installed ................ .. .. installedinstalledcompatible compatible .. ..-------------------------------------------------- -------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... cpu_adamcpu_adam......[YES] ....................................[OKAY] [YES][OKAY][YES] ............ [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... .............[OKAY] [NO]fused_adamfused_adam fused_lamb.................... [OKAY].............[NO] ............. [NO]fused_lamb.......[NO] ........................... [OKAY] [NO] [OKAY] [OKAY] ....... [OKAY]fused_lamb fused_lamb............. .............[NO] [NO]....... .......sparse_attn[OKAY] [OKAY]............ sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformersparse_attn transformer............sparse_attn [NO] ........................ ............ .......[NO][NO] .......[OKAY][NO] ....... [OKAY]....... [OKAY]stochastic_transformer [OKAY]stochastic_transformer .transformer . [NO]transformer ............ [NO] ................... [NO]....... [OKAY] [NO] [OKAY] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- async_io ............... [NO] ....... [NO] DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** op nameop nameop name op name ................ ................................ ................ installed installedinstalled installed .. .... .. compatible compatible compatiblecompatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam............... .............................................[YES] [YES]......[YES][YES] ......[OKAY]............ [OKAY] [OKAY] [OKAY] fused_adam .............fused_adam fused_adam fused_adam[NO] ............. ............. ............. .......[NO] [NO] [NO] [OKAY] .............. ....... [OKAY][OKAY][OKAY] fused_lamb .............fused_lamb fused_lambfused_lamb [NO] ............. .......................... ....... [NO] [NO][NO] [OKAY] ....... .............. [OKAY][OKAY][OKAY] sparse_attn ............ [NO]sparse_attn sparse_attnsparse_attn....... ........................[OKAY]............ DeepSpeed general environment info: [NO][NO][NO] .....................transformer [OKAY][OKAY][OKAY]............ torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] [NO] transformer.......transformer transformer ............ [OKAY] ........................ [NO] [NO][NO]....... ..............stochastic_transformer[OKAY] torch version .................... 1.8.1 [OKAY][OKAY] . [NO] .......stochastic_transformerstochastic_transformerstochastic_transformer [OKAY] torch cuda version ............... 11.1 ... [NO][NO] [NO] .............. .......[OKAY][OKAY] [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name op nameop name................................ ................ installed ................ installed ..installedinstalled .. .. ..compatible compatible compatible ------------------------------------------------------------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam ...............cpu_adam ............... [YES].............................. ...... [YES] [YES][OKAY][YES] .................. [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adamfused_adam ............. ............. ............. fused_lamb[NO] .............[NO] [NO][NO] ....... ....... .......[OKAY] ....... [OKAY][OKAY]fused_lamb [OKAY] ............. [NO] .......fused_lambfused_lamb [OKAY].......................... sparse_attn[NO][NO] ................... .......[NO][OKAY] .......[OKAY]sparse_attn [OKAY] ............ [NO] transformer....... ............sparse_attn[OKAY] [NO]............sparse_attn transformer....... [NO] ............[OKAY] ............ ....... [NO] stochastic_transformer[NO] [OKAY] ....... ........ transformer[OKAY][NO][OKAY] .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY]............ async_io ............... [NO] ....... [NO] transformer [NO] stochastic_transformer................... [NO].[OKAY] .......[NO] [OKAY]....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- --------------------------------------------------op name---------------------------------------------------------------------------------------------------- ................op name op nameop nameinstalled................ installed.................. ................ ..compatible installed --------------------------------------------------compatible..installed compatible--------------------------------------------------.. --------------------------------------------------compatible DeepSpeed general environment info: cpu_adam ...............-------------------------------------------------- cpu_adam[YES] cpu_adam..................... ...............[YES][OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 [YES]...... ......[OKAY]cpu_adam ............... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_adam[YES] ............. ......[NO] fused_adam [OKAY] fused_adam....... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............. .............[NO][OKAY] [NO]....... .......[OKAY]fused_lamb [OKAY]............. fused_adam[NO]fused_lambfused_lamb ....... ....................................... [OKAY][NO][NO][NO] ..................... [OKAY] [OKAY] [OKAY] sparse_attn fused_lamb............ .............[NO] [NO]....... sparse_attnsparse_attn .......[OKAY] ............ ............ transformer[NO][NO][OKAY] ............ .............. [OKAY][NO][OKAY] ....... [OKAY]transformertransformer ........................ [NO][NO]stochastic_transformer .......sparse_attn....... . [OKAY] ............ [OKAY][NO] .......[NO]stochastic_transformer [OKAY]stochastic_transformer ........ . [OKAY] [NO] [NO] .............. transformer [OKAY] [OKAY] ............ [NO] ....... [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name op name................................................ ................installedinstalledinstalled installed...... compatible..compatiblecompatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam............................................. ...............[YES][YES][YES] ......[YES]............ ......[OKAY][OKAY] [OKAY] [OKAY] fused_adamfused_adam fused_adamfused_adam ............. ............. [NO].............[NO]............. .......[NO][NO]....... [OKAY].......[OKAY] ....... [OKAY] [OKAY]fused_lamb fused_lamb .............fused_lamb............. [NO]fused_lamb[NO]............. ...........................[NO] [NO][OKAY][OKAY]....... .......[OKAY] [OKAY] sparse_attnsparse_attn ........................sparse_attn sparse_attn[NO][NO]............ ...................[NO]....... [OKAY][NO][OKAY]....... .......[OKAY] transformertransformer[OKAY] ........................transformer [NO]transformer[NO]............ ...................[NO]....... [NO][OKAY].......[OKAY] .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformerstochastic_transformer.. [NO][NO]. ...............[NO] [NO][OKAY][OKAY]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name................ op name................ installed ................................ installed .. ..installed installed compatible compatible .... -------------------------------------------------- --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam ............... ............... .....................[YES] [YES][OKAY]......[YES] [OKAY]............ [OKAY][OKAY] fused_adam ............. fused_adam[NO]fused_adam fused_adam ................................. [NO][NO] [OKAY] ....... ............. ....... [OKAY] fused_lamb [NO][OKAY] ............. fused_lamb.......[NO] fused_lamb ............. .......[OKAY] ............. [NO] [OKAY] [NO]fused_lamb....... ....................[OKAY] [NO] .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformersparse_attnsparse_attn sparse_attn............ ............ ............[NO]............[NO] ..............[NO] [NO] [OKAY][OKAY].............. [OKAY][OKAY] transformer stochastic_transformer............ transformertransformer [NO] . ............ ...................[NO] [OKAY].......[NO][NO] [OKAY] ....... ....... stochastic_transformer[OKAY][OKAY] . [NO] .......stochastic_transformerstochastic_transformer [OKAY] .. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja .................................... .................................... [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................ ................ ................................ installedinstalled installed ....installed .. compatiblecompatible .. compatible--------------------------------------------------compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... ...............[YES] [YES] ...............[YES]...... ......[OKAY]......[YES] [OKAY][OKAY]...... [OKAY] fused_adam .............fused_adam fused_adam[NO]fused_adam ............. .......................... [NO]....... [NO][OKAY][NO] ....... ..............fused_lamb[OKAY] .............[OKAY][OKAY] fused_lamb[NO] fused_lamb.............fused_lamb....... [NO] [OKAY].......................... ....... [NO][NO][OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn ............ sparse_attnsparse_attn [NO]transformer ........................................... [NO] [NO][OKAY][NO] .............. transformer[OKAY].......[OKAY] ............[OKAY] transformerstochastic_transformer [NO] .............transformer....... [NO] [NO][OKAY] ............ ....... [NO].......stochastic_transformer[OKAY] .......[OKAY] .[OKAY] [NO]stochastic_transformer .......stochastic_transformer . [OKAY] . [NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op name op name................ op name................................installed ..................installedinstalled installed.. compatible.. compatible DeepSpeed general environment info: compatible ..-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- compatible torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 cpu_adamcpu_adamcpu_adam cpu_adam ............... .............................. ............... [YES][YES] [YES] [YES] ............ [OKAY][OKAY]...... ...... [OKAY] [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam fused_adam.............fused_adam fused_adam[NO]............. .................... ............. [NO][NO] [NO] [OKAY] ..................... deepspeed info ................... 0.4.2+bc17042, bc17042, big-science [OKAY][OKAY][OKAY]fused_lamb deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............. fused_lambfused_lambfused_lamb[NO] .............................................. [NO] [NO][OKAY][NO]....... .......[OKAY]....... [OKAY] [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] sparse_attn sparse_attnsparse_attn............transformer ............[NO] ............[NO]................... [NO][NO][OKAY]....... ..............[OKAY] transformer [OKAY][OKAY]............ /bin/sh: line 0: type: git: not found [NO]transformer transformerstochastic_transformer................... [OKAY]............[NO] . [NO]....... [NO]stochastic_transformer....... [OKAY].......[OKAY] . [OKAY] [NO]stochastic_transformer stochastic_transformer....... .[OKAY] . [NO] [NO]....... .......[OKAY] [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY]-------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninjaninjaninja ninja .................................... .................. .................. [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op nameop name................ op name ................ ................installed ................ installedinstalledinstalled .. .. .. .. compatiblecompatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found op name op name op name................ ................ ................ installed ................installed installed .. ..installed .. compatible ..compatible compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- cpu_adam cpu_adamcpu_adam ............... ...............cpu_adam ............... [YES] [YES][YES]..................... ......[YES]......[OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam cpu_adam............... cpu_adam[YES]............... cpu_adam ...............[YES]...... ............... [OKAY][YES]......[YES] ......[OKAY]...... [OKAY]...... [OKAY] [OKAY] [OKAY] [OKAY] fused_adam ............. fused_adam[NO] fused_adam.................... fused_adam.............[OKAY][NO] ............. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_adam ............. [NO] fused_adam....... .............[OKAY]fused_adam fused_adam [NO]....... [NO].......fused_lamb [OKAY] ............. .......[OKAY] [NO][OKAY]fused_lamb .............[NO] .............fused_lamb....... [NO] [NO] [OKAY]............. ....... ....... .............[OKAY] fused_lamb[NO] .......[NO] fused_lamb [OKAY][OKAY]....... .............[OKAY] fused_lamb ................................. [NO][OKAY][NO] ....... .......[OKAY] [OKAY] [NO]fused_lamb fused_lamb.................... .............[OKAY][NO] [NO].......sparse_attn .......[OKAY]............ sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [OKAY][NO] ....... [OKAY] [NO] ....... sparse_attn[OKAY] transformer............ sparse_attn transformer............ ............[NO] [NO]....... sparse_attn.......[OKAY]sparse_attn ............[OKAY]............ transformer sparse_attn............[NO] ............[NO].......transformer ............[NO].......[OKAY] [NO][OKAY]....... .......transformer[OKAY] [NO] [NO] ............ .......stochastic_transformer ....... [OKAY] [NO] .[OKAY]....... [NO][OKAY] stochastic_transformer[OKAY]............ [NO]transformer. .......stochastic_transformer............[NO] [OKAY][NO]........ transformertransformer....... ............[OKAY] stochastic_transformer............[NO] [NO]........ .......[OKAY][NO] [NO].......[OKAY] .......[OKAY]stochastic_transformer [OKAY] [OKAY]....... [OKAY] . [NO]stochastic_transformer ....... [OKAY]. stochastic_transformer stochastic_transformer. [NO] ........ [NO][OKAY] [NO] ....... [OKAY] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name ................op name ................ ................installed................ installed installed installed ...... ..compatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam ..............................cpu_adam............... [YES] [YES][YES] ............... .................. [YES] [OKAY][OKAY] [OKAY]...... [OKAY] fused_adamfused_adamfused_adam .............fused_adam ............. [NO] ............. .............[NO]....... [NO].......[OKAY][NO] ....... [OKAY] ....... [OKAY] fused_lamb [OKAY] fused_lamb.............fused_lamb fused_lamb [NO]....................................... .......[NO][NO][NO] .............. [OKAY]....... [OKAY] [OKAY][OKAY] sparse_attnsparse_attnsparse_attn ............sparse_attn............ ........................ [NO][NO][NO][NO] ............................ [OKAY][OKAY][OKAY] [OKAY] transformer ............transformertransformer transformer [NO]............ ............ [NO]............ ....... [NO]....... [OKAY][NO] [OKAY] ....... ....... [OKAY]stochastic_transformerstochastic_transformer[OKAY] .. stochastic_transformer[NO]stochastic_transformer[NO] ............... .[OKAY][NO][OKAY] [NO]....... [OKAY] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY] [OKAY][OKAY]-------------------------------------------------- ----------------------------------------------------------------------------------------------------op name-------------------------------------------------- op nameop name................ op name ................................installed ................ installed..installed installed.. compatible ....compatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... ...............[YES]...............[YES] ............[YES] [YES] [OKAY] [OKAY]............ [OKAY][OKAY] fused_adamfused_adam fused_adam..........................fused_adam [NO] [NO].......................... [NO] .............. [NO] [OKAY][OKAY]....... .......[OKAY] fused_lamb[OKAY] fused_lamb............. fused_lamb fused_lamb[NO].......................... .......[NO].............[NO] [OKAY]....... [NO][OKAY]....... .......[OKAY] [OKAY] sparse_attnsparse_attn ............sparse_attn............ sparse_attn [NO][NO] ............ .......................... [OKAY][NO][NO][OKAY] .............. transformer[OKAY]transformer[OKAY] ........................ transformer [NO]transformer [NO] ................... ............ .......[OKAY] [NO] [NO][OKAY] ..............stochastic_transformer [OKAY]stochastic_transformer[OKAY] . .[NO] stochastic_transformer[NO]stochastic_transformer ....... ........ . [OKAY] [NO][OKAY] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc versionDeepSpeed general environment info: .......................................... 11.211.2 deepspeed install pathdeepspeed install path torch install path...................... ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... torch version ...... torch 1.8, cuda 11.1 .................... torch 1.8, cuda 11.1 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found ninjaninjaninja ninja ...................................................... [OKAY] [OKAY].................. [OKAY]--------------------------------------------------[OKAY]-------------------------------------------------- op name-------------------------------------------------- op name --------------------------------------------------................ op name **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ................installed ................installedop name .. installed.. ................ compatible..compatible installed----------------------------------------------------------------------------------------------------compatible ..-------------------------------------------------- compatible cpu_adam-------------------------------------------------- cpu_adam ............... ...............cpu_adam[YES] [YES]..................... ......[YES][OKAY] [OKAY]cpu_adam...... [OKAY]............... [YES] fused_adam...... .............fused_adam[OKAY] fused_adam[NO]............. ....................[NO] [NO].......[OKAY] ....... [OKAY] [OKAY] fused_lambfused_lambfused_lamb fused_adam .......................... .............[NO].............[NO] [NO]....... ..............[NO][OKAY] [OKAY] [OKAY] ....... [OKAY] fused_lamb ............. sparse_attn[NO] sparse_attn............sparse_attn ............[NO]................... [NO]....... [NO] [OKAY] [OKAY].............. [OKAY][OKAY] transformer transformer............transformer ............[NO]............ .......[NO][NO] [OKAY].............. sparse_attn[OKAY][OKAY] stochastic_transformer............ stochastic_transformerstochastic_transformer [NO]. . .[NO] ....... [NO][NO] .......[OKAY].............. [OKAY][OKAY][OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................DeepSpeed general environment info: DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... 11.2deepspeed install path ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io async_io............... [NO]............... [NO]....... .......[NO] [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] torch version .................... 1.8.1 ....... [OKAY] torch cuda version ............... 11.1 utils .................. utils[YES] ........................ [OKAY][YES] nvcc version ..................... 11.2 ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 transformer_inference .. [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils async_io.................. ...............[YES] [NO]...... .......[OKAY] [NO] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science utils .................. [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninja .................. ..................[OKAY] [OKAY]-------------------------------------------------- --------------------------------------------------op name ................op name installed .................. compatibleinstalled --------------------------------------------------.. compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam ............. fused_lamb[NO] ............. .......[NO] [OKAY]....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............transformer [NO]............ [NO]....... .......[OKAY] [OKAY] transformer stochastic_transformer............ .[NO] [NO]....... ....... [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name................ ................ ................ ................installed installed installed ..installed .. ..compatible .. compatiblecompatible op nameop name --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name................op name ................ ................................installedinstalled .. installedinstalled.. compatiblecompatible .... -------------------------------------------------- compatible--------------------------------------------------compatible cpu_adamcpu_adam .............................. cpu_adamcpu_adam[YES] [YES].................................... ......[YES][YES][OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adam............... cpu_adam...............[YES][OKAY] ......[OKAY]...... [OKAY] [OKAY] ...............[YES]...... ......[YES][OKAY] [OKAY]...... fused_adam ............. fused_adam[NO] .................... fused_adam[NO] [OKAY]fused_adam ............. ....... ............. [NO] fused_lamb[NO][OKAY] ....... fused_adam[OKAY] ............. ....... [OKAY] fused_lamb [NO] [OKAY].................... ............. [NO] ....... fused_adam[OKAY] [NO]fused_lamb[OKAY] ....... .............fused_lamb [OKAY][NO]............. ............. fused_adam[NO]fused_lamb fused_adam................................. [NO].............[NO][OKAY] .......[NO] [OKAY] ....... [OKAY] ..............[NO] [OKAY][OKAY] fused_lamb ....... .............[OKAY] fused_lamb[NO] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] .................... fused_lamb [NO] sparse_attn.................... [OKAY]............ [NO] .......sparse_attn transformer[OKAY]............ [NO][OKAY][NO] ....... .......[OKAY] [OKAY] sparse_attn............[NO] transformer ............ [NO]................... [NO] .......[OKAY][NO] [OKAY].............. sparse_attn ............transformer [NO]............ .......[NO] sparse_attn[OKAY].......sparse_attn transformer[OKAY][OKAY] stochastic_transformer ............ [NO] transformer........ stochastic_transformer [NO][OKAY] ............ ............[OKAY]transformer............ ........[NO] [NO][OKAY]stochastic_transformer ....... [NO] ............ [NO]stochastic_transformer....... [NO][OKAY]........ ....... [OKAY][OKAY]. ....... [NO][OKAY]transformer [OKAY]................... [NO] .......stochastic_transformer [OKAY] . [NO] ....... [OKAY] transformer[OKAY] [NO]stochastic_transformer............ ....... [NO].[OKAY] .......[NO] .......[OKAY]stochastic_transformer [OKAY] . stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- nvcc version ..................... 11.2 JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found ninjaninjaninjaninja .................. .................. .................................... [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name --------------------------------------------------op name **** Git info for Megatron: git_hash=unknown git_branch=unknown **** op name ................ ................ op name................ installed installedinstalled.................. ..compatible .. installed compatible compatible-------------------------------------------------- .. -------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] cpu_adam[YES] ...... cpu_adam............... ......[OKAY][YES] ...............[OKAY]...... [YES][OKAY] ...... [OKAY] fused_adam .............fused_adam [NO]............. .......[NO] fused_adam [OKAY] fused_adam....... ............. [OKAY].............fused_lamb[NO] [NO] ...........................fused_lamb [NO][OKAY][OKAY] ............. .......[NO] [OKAY]fused_lamb.......fused_lamb ............. [OKAY][NO]............. .......[NO] [OKAY] ....... [OKAY]sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] sparse_attn.......transformer ............sparse_attn............[OKAY] [NO]............[NO]transformer ..............[NO] ............ [OKAY][OKAY] ....... [NO] [OKAY]....... transformer stochastic_transformer [OKAY]transformer ............ .............[NO] stochastic_transformer [NO][NO] ....... ....... ....... . [OKAY][OKAY] [OKAY] [NO] ....... stochastic_transformer[OKAY] stochastic_transformer . .[NO] [NO]....... [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op nameop name ................ ................................ ................ installed installedinstalled installed.. ......compatible compatible compatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam ............................................. [YES][YES][YES] cpu_adam ............ ...... ...............[OKAY][OKAY][OKAY] [YES] ...... [OKAY] fused_adamfused_adam fused_adam ............. ............. ............. [NO] [NO] [NO]....... ..............[OKAY] [OKAY][OKAY] fused_adam fused_lambfused_lamb fused_lamb.......................... ............. ............. [NO][NO][NO] .......[NO].............. .......[OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... sparse_attnsparse_attnsparse_attn[OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. .................................... [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformertransformer transformer .................................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn stochastic_transformerstochastic_transformer stochastic_transformer .............. . [NO] [NO][NO] [NO] ....... ..................... [OKAY][OKAY][OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op name ................op name................ installed................installed................ ..installed..installed compatible..compatible .. --------------------------------------------------compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam cpu_adam.............................. ..............................[YES][YES] [YES][YES]............ ...... ......[OKAY] [OKAY] [OKAY][OKAY] fused_adamfused_adam fused_adam.......................... fused_adam [NO] .............[NO] ............. .............. [NO] [NO] [OKAY][OKAY] ....... ....... fused_lamb[OKAY]fused_lamb[OKAY] ............. ............. [NO]fused_lamb [NO]fused_lamb ....... ................................. [OKAY] [OKAY][NO] [NO] .............. [OKAY][OKAY] sparse_attn ............ [NO]sparse_attn ................... sparse_attnsparse_attn[OKAY][NO] ............................... [NO]transformer[OKAY][NO] ............ ..............transformer [NO] [OKAY] ............ ....... [OKAY] [NO]transformer [OKAY] .......transformer............ [OKAY] stochastic_transformer ............ [NO] .[NO]....... stochastic_transformer[NO]....... [OKAY]........[OKAY] [NO][OKAY] stochastic_transformer....... stochastic_transformer[OKAY] . .[NO] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utilsasync_io .................. ...............[YES] [NO]...... .......[OKAY] [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference ..-------------------------------------------------- [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................................... ....................................[OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name................op nameop name ................................installed................ installed installedinstalled.. ......compatible -------------------------------------------------- compatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ...............cpu_adamcpu_adam ...............[YES]............... cpu_adam [YES]......[YES] ......[OKAY]..................... [OKAY] [OKAY] [YES] fused_adam...... fused_adam.............[OKAY] .............fused_adam[NO] [NO]............. ....... ....... [NO] [OKAY] [OKAY]....... fused_lamb[OKAY] fused_lamb ............. .............fused_lamb[NO] [NO]............. ....... fused_adam.......[NO][OKAY] ............. ....... [OKAY] [NO][OKAY] ....... [OKAY] sparse_attn ............ fused_lamb[NO] sparse_attn.................... ............[OKAY] sparse_attn [NO] [NO] ............transformer .............. [NO] ............ [OKAY] ....... [NO] [OKAY] [OKAY] transformer.......transformer ............[OKAY]............ [NO][NO] .......stochastic_transformer ....... [OKAY] [OKAY]. [NO]stochastic_transformer sparse_attn .......stochastic_transformer .[OKAY] .[NO] ............ ....... [NO] [OKAY] .......[NO] [OKAY]....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja .................................... .................................... [OKAY][OKAY] [OKAY] --------------------------------------------------[OKAY] -------------------------------------------------- --------------------------------------------------op name-------------------------------------------------- op name ................ op name................ op name installed ................ installed................ .. installed.. installed .. compatible .. compatible --------------------------------------------------compatible -------------------------------------------------- --------------------------------------------------compatible cpu_adam ............... [YES]cpu_adam -------------------------------------------------- cpu_adam...... ............... ...............[YES][OKAY] ...... [YES] [OKAY] ......cpu_adam [OKAY] fused_adam ............. ...............fused_adam [NO] .............fused_adam....... [YES].............[NO] [OKAY] ....... [NO] ......[OKAY] fused_lamb .......[OKAY] .............fused_lamb [NO][OKAY]............. .......[NO] fused_lamb[OKAY]....... ............. [OKAY]fused_adam [NO]............. .......[NO] [OKAY] sparse_attn ................... sparse_attn [OKAY][NO]............ sparse_attn.......[NO] ...................[OKAY] [OKAY]fused_lamb[NO] transformer....... transformer ............ [OKAY] .........................[NO] [NO]transformer....... ...................[NO][OKAY] [NO][OKAY] ..............stochastic_transformer [OKAY]stochastic_transformer .[OKAY] .[NO] stochastic_transformer [NO] .............. .[OKAY][OKAY] [NO] ....... sparse_attn[OKAY] ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------op name utils .................. [YES] ...... [OKAY] ................op nameop name op name installed ................ ..................................installed installedinstalledcompatible.. quantizer .............. [NO] ....... [OKAY] ..compatible..-------------------------------------------------- --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... [YES][OKAY]cpu_adam cpu_adam..................... ...............[YES][OKAY] [YES]...... ......fused_adam[OKAY] [OKAY]............. fused_adam[NO] .................... [NO][OKAY] ....... [OKAY] fused_adamfused_lamb fused_lamb fused_adam............. .......................... .............[NO] [NO] [NO][NO] ....... ....... ....... [OKAY] .......[OKAY] [OKAY] [OKAY] fused_lamb .............fused_lamb [NO]............. .......[NO] sparse_attn[OKAY]sparse_attn....... ........................[OKAY] [NO][NO] .............. [OKAY][OKAY] transformer transformer............ sparse_attn............ [NO]............[NO]sparse_attn .......[NO]................... [OKAY].......[OKAY][NO] [OKAY]....... stochastic_transformerstochastic_transformer[OKAY] transformer.. transformer............[NO][NO] ..........................[NO] [OKAY] [OKAY][NO] ....... .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ......utils [OKAY].................. [YES] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- ....................torch cuda version 1.8.1............... 11.1torch cuda version JIT compiled ops requires ninja ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. nvcc version............... .....................11.1 11.2nvcc version JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ...................deepspeed wheel compiled w. ...... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO]async_io ...................... [NO][OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ....... [NO] utils .................. [YES] transformer_inference...... .. [OKAY][NO] ....... [OKAY] quantizerutils ................................ [YES][NO] ............. [OKAY] [OKAY] quantizer .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled ........ compatiblecompatiblecompatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adamcpu_adam ............................................................ [YES][YES][YES][YES] ........................ [OKAY] [OKAY][OKAY][OKAY] fused_adam .............fused_adam fused_adamfused_adam [NO] ............. ............. .................... [NO] [NO] [NO][OKAY] ....... ..............[OKAY] [OKAY][OKAY] fused_lamb .............fused_lamb fused_lamb [NO]fused_lamb............. .................................[NO] [OKAY][NO][NO] ..................... [OKAY][OKAY][OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] sparse_attn.......sparse_attn ............sparse_attn[OKAY]............ torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 [NO]............ [NO] ....... [NO]transformer ....... [OKAY] ................... [OKAY] [OKAY][NO] torch cuda version ............... 11.1 transformer....... transformer transformer............ [OKAY] ........................ [NO] [NO][NO]....... ..............[OKAY] [OKAY][OKAY]stochastic_transformer nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] . stochastic_transformerstochastic_transformerstochastic_transformer [NO] ......... . [NO] [OKAY] [NO][NO] ....... ..............[OKAY] [OKAY][OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninja ninja .................. .................................... ..................[OKAY] [OKAY]--------------------------------------------------[OKAY][OKAY] --------------------------------------------------op name-------------------------------------------------- -------------------------------------------------- ................ op nameop name op name ................................installed ................installed..installed installedcompatible.. ....compatible-------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ............... ............... [YES] [YES] [YES] ...... ...... cpu_adam...... [OKAY] ...............[OKAY][OKAY] [YES] ...... fused_adam[OKAY] .............fused_adam fused_adam [NO] ................................. [NO][OKAY][NO] .............. fused_adam fused_lamb[OKAY] [OKAY] ............. ............. [NO][NO]fused_lamb fused_lamb ....... ............. ....... .............[NO][OKAY] [OKAY][NO]....... .......[OKAY] [OKAY] fused_lambsparse_attn ......................... [NO][NO] .......sparse_attn .......[OKAY] sparse_attn[OKAY] ............ ............ [NO][NO]transformer .......................... [OKAY][NO][OKAY] ....... transformersparse_attntransformer[OKAY] ............ ........................ [NO][NO]stochastic_transformer[NO] ....... .............. .[OKAY][OKAY] [NO][OKAY] ....... stochastic_transformer[OKAY]transformerstochastic_transformer .. ............[NO][NO] [NO] ....... ....... .......[OKAY] [OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... 1.8.1torch install path ...............torch cuda version torch install path ............... 11.1............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version ..................... torch version11.2 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................deepspeed install path 1.8.1........... torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version.................... deepspeed info1.8.1............... ...................11.1 torch cuda version 0.4.2+bc17042, bc17042, big-science nvcc version ............... deepspeed wheel compiled w......................11.1 ......11.2 nvcc versiontorch 1.8, cuda 11.1deepspeed install path ................................ 11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ...... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................. ....................................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name --------------------------------------------------................op nameop name installed................ ................op nameinstalled.. installed..compatible ..................compatible-------------------------------------------------- compatible installed -------------------------------------------------- --------------------------------------------------.. cpu_adam ...............compatible [YES]cpu_adam-------------------------------------------------- .....................cpu_adam ............... [YES][OKAY][YES] ............ [OKAY][OKAY] cpu_adam ............... [YES] ......fused_adam [OKAY].............fused_adam fused_adam[NO]............. ....................[NO] [OKAY][NO]....... .......[OKAY] [OKAY]fused_lambfused_adam fused_lamb .......................... fused_lamb [NO] .............[NO]............. [NO] .............. [NO] ....... .......[OKAY] [OKAY] [OKAY][OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attnsparse_attn sparse_attn ............ ........................ [NO] [NO] [NO] .......sparse_attn ....... ................... [OKAY] [OKAY][OKAY][NO] transformer transformertransformer............ ....... ........................[NO][OKAY] [NO][NO]....... ..............[OKAY] transformer[OKAY][OKAY] stochastic_transformer............ stochastic_transformer stochastic_transformer [NO]. ..[NO] ....... [NO][NO] ....... [OKAY].............. [OKAY][OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ...... torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY].......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] [OKAY]quantizer async_io ............... [NO] ....... [NO] .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed general environment info: -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] torch version .................... 1.8.1 transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] torch cuda version ............... 11.1 utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] nvcc version ..................... 11.2 ....... [OKAY] -------------------------------------------------- -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system utils .................. [YES] ...... [OKAY] meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... torch cuda version ............... 11.1 [OKAY] -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > setting tensorboard ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch versiontorch version ........................................ 1.8.11.8.1 deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch versiontorch version ........................................ 1.8.11.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name ................op name ................ installed................installed ................ ....installed installed compatible compatible .... -------------------------------------------------- -------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... cpu_adam cpu_adam ...............[YES] ............... ...............[YES] ...... [YES] [YES] ...... [OKAY] ...... [OKAY]...... [OKAY][OKAY] fused_adam .............fused_adamfused_adam [NO] ............. fused_adam.................... .............[NO][NO][OKAY] [NO] ....... [OKAY] ..............fused_lamb fused_lamb[OKAY].............[OKAY] .............[NO] fused_lamb.......fused_lamb[NO] [OKAY].................... ............. [NO][OKAY] [NO] ....... .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY] ............sparse_attntransformer sparse_attn [NO] ........................ ................... [NO] [NO] [NO] .......[OKAY] ....... ....... [OKAY] [OKAY] transformer[OKAY] ............ transformerstochastic_transformer[NO] transformer ............ .................... [NO] [OKAY][NO][NO] ....... ..............[OKAY] stochastic_transformer[OKAY] [OKAY] .stochastic_transformer [NO]stochastic_transformer . ....... [NO].[OKAY] .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... 1.8.1 torch cuda versiontorch install path .............................. 11.1 nvcc version ..................... 11.2['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install pathtorch version ............................... 1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch cuda version .................................. 0.4.2+bc17042, bc17042, big-science11.1 deepspeed wheel compiled w.nvcc version ........................... torch 1.8, cuda 11.111.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ...............DeepSpeed general environment info: torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............torch install path ............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version torch version...............torch version 11.1........................................ 1.8.1nvcc version1.8.1 .....................torch cuda version torch cuda version 11.2 ............... ............... deepspeed install path 11.1 11.1 ........... nvcc versionnvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].......................................... /bin/sh: line 0: type: git: not found deepspeed info11.211.2 ...................deepspeed install path deepspeed install path 0.4.2+bc17042, bc17042, big-science ...................... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed info deepspeed info torch 1.8, cuda 11.1...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 torch version .................... 1.8.1 nvcc version ..................... 11.2 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 nvcc version ..................... torch version11.2 ....................deepspeed install path 1.8.1........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...............deepspeed info 11.1................... 0.4.2+bc17042, bc17042, big-sciencenvcc version .....................deepspeed wheel compiled w. 11.2...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found async_ioasync_io async_io.............................. [NO]............... [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO] [NO] ....... ....... ....... [NO] [NO][NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inferencetransformer_inference .... transformer_inference[NO][NO] ................async_io [OKAY][OKAY][NO]............... [NO]....... .......[OKAY] utils [NO] utils .................................... [YES][YES] utils............ ..................[OKAY][OKAY] [YES] ...... transformer_inference[OKAY] quantizer .. quantizer .............. [NO] ..............[NO]quantizer....... [NO].....................[OKAY] ....... [OKAY][NO] [OKAY] ....... [OKAY] utils---------------------------------------------------------------------------------------------------- ..................-------------------------------------------------- [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. [NO] utils....... ..................[OKAY] [YES] ......-------------------------------------------------- [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path DeepSpeed general environment info:........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................torch install path 0.4.2+bc17042, bc17042, big-science ............... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch install path 1.8.1............... torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version ..................... 11.2torch version deepspeed install path.................... ...........1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version deepspeed info............... ...................11.1 nvcc version0.4.2+bc17042, bc17042, big-science .....................deepspeed wheel compiled w. 11.2...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ...................... [NO] [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................................ ................ installedinstalledinstalledinstalled ........ compatiblecompatiblecompatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam .............................. ............... ............... [YES][YES] [YES] [YES] ............ ............ [OKAY][OKAY][OKAY][OKAY] fused_adamfused_adamfused_adam fused_adam............. ............. ............. .............[NO][NO] [NO][NO].............. ..............[OKAY][OKAY] [OKAY][OKAY] fused_lambfused_lambfused_lamb ..........................fused_lamb............. [NO] [NO] [NO]............. ....... ....... .......[NO] [OKAY] [OKAY][OKAY] ....... [OKAY] sparse_attnsparse_attnsparse_attn .................................... [NO]sparse_attn[NO][NO] ....... ............ .............. [OKAY] [NO] [OKAY][OKAY] ....... transformer[OKAY] transformer transformer............ ........................[NO] [NO][NO].......transformer ..............[OKAY]............ [OKAY][OKAY][NO] ....... stochastic_transformer[OKAY] stochastic_transformerstochastic_transformer . [NO].. stochastic_transformer .......[NO] [NO] [OKAY]........ .......[NO][OKAY] [OKAY]....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install pathDeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version ....................torch install path 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch cuda versiontorch version ................................... 11.11.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version torch cuda version..................... torch version ............... 11.2 .................... 11.1 deepspeed install path1.8.1nvcc version ................................ torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2............... deepspeed info11.1deepspeed install path ..............................nvcc version 0.4.2+bc17042, bc17042, big-science..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.11.2 deepspeed info......deepspeed install path ...................torch 1.8, cuda 11.1........... 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name op name-------------------------------------------------- ................ op name op name................installed .................................. installed installed installed ..compatible.. ..compatible--------------------------------------------------compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam...... cpu_adam.............................. [OKAY][YES][YES]............... ............ [YES] [OKAY] [OKAY] ...... fused_adam[OKAY] ............. [NO] ....... [OKAY]fused_adam fused_adam .......................... fused_lamb[NO][NO] fused_adam .................... ....... [OKAY]............. [NO] [OKAY] fused_lamb [NO] .................... [OKAY]fused_lamb[NO] ........................... [NO][OKAY] [OKAY] ....... [OKAY] fused_lambsparse_attn ......................... [NO]sparse_attn [NO] ....... ............ .......[OKAY]sparse_attn[NO] [OKAY].......transformer............ [OKAY] ............[NO] [NO]....... transformer ....... [OKAY] ............ [OKAY] [NO]transformer ................... sparse_attn[OKAY][NO]stochastic_transformer ...................stochastic_transformer . [OKAY] [NO].[NO] .......stochastic_transformer [OKAY][NO] ............... [OKAY][OKAY][NO] ....... [OKAY]transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path DeepSpeed general environment info:...............DeepSpeed general environment info: DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install pathtorch install path torch install path ..............................torch version ............... .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... 11.1torch versiontorch versiontorch version ....................nvcc version........................................ 1.8.1.....................1.8.11.8.1 11.2torch cuda version torch cuda versiontorch cuda version deepspeed install path............... ............... ............... ...........11.1 11.1 11.1 nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version nvcc version ..................... ..................... deepspeed info..................... 11.2 11.211.2 ................... deepspeed install path deepspeed install pathdeepspeed install path 0.4.2+bc17042, bc17042, big-science................................. deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... deepspeed infodeepspeed infodeepspeed infotorch 1.8, cuda 11.1 ......................................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. deepspeed wheel compiled w. ...... ...... ...... torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op nameop name ................ ................................ ................ installed installed installedinstalled .. ....compatible.. compatiblecompatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam ...... .............................. ............... [YES][OKAY] [YES] ......[YES] [OKAY]............ [OKAY][OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] .............fused_adamfused_adam [NO]fused_lamb.......................... ....................[NO][NO] [OKAY].............. [NO] [OKAY][OKAY]....... fused_lamb [OKAY]............. fused_lambfused_lamb[NO] ................................. [NO][OKAY][NO] ....... sparse_attn ....... [OKAY]............ [OKAY][NO] ....... [OKAY] sparse_attn ............transformer [NO]............ sparse_attn[NO]....... sparse_attn............[OKAY]....... ............[OKAY][NO] transformer [NO] .......stochastic_transformer............ [NO] [OKAY] ....... ........ [OKAY]transformer[OKAY][NO] ................... transformer [NO] [OKAY] ............stochastic_transformer ....... [NO][OKAY] ........ [NO][OKAY] .......stochastic_transformer [OKAY]stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op name................op name................ installed................................installed installed....installed compatible..compatible.. --------------------------------------------------compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam..................... cpu_adam...............[YES][OKAY] .....................[YES] [OKAY][YES]...... ......[OKAY] [OKAY]fused_adam ............. [NO] ....... [OKAY]fused_adam ............. [NO]fused_adamfused_lamb ....... fused_adam .......................... [OKAY] ............. [NO] [NO] [NO].............. fused_lamb ....... [OKAY][OKAY] ............. [OKAY] [NO] fused_lamb....... .............fused_lamb [OKAY] [NO] ............. .......sparse_attn [NO] [OKAY] ................... [NO][OKAY] .......sparse_attn [OKAY]............ [NO] transformer....... ............[OKAY] [NO] .......transformer sparse_attnsparse_attn............[OKAY] ............[NO] ............[NO]stochastic_transformer ....... [NO] ....... [OKAY] ........ [OKAY] [OKAY][NO] stochastic_transformertransformer....... transformer............[OKAY]. ............[NO] [NO] [NO] ....... ..............[OKAY] [OKAY] [OKAY] stochastic_transformer . stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathDeepSpeed general environment info: torch install path............... ............... torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version torch version.................... ....................1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda version ...............torch cuda versiontorch version 11.1................................... nvcc version11.11.8.1 .....................nvcc version 11.2torch cuda version..................... deepspeed install path...............11.2 ...........deepspeed install path 11.1 ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].....................deepspeed info 11.2...................deepspeed info deepspeed install path0.4.2+bc17042, bc17042, big-science................... ...........0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. torch 1.8, cuda 11.1......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: torch install pathtorch install path ..............................torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version torch version ........................................torch version 1.8.11.8.1.................... 1.8.1torch cuda versiontorch cuda version .............................. torch cuda version 11.1 11.1 ............... nvcc versionnvcc version11.1 .......................................... nvcc version11.211.2 .....................deepspeed install pathdeepspeed install path 11.2...................... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed infodeepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................................... deepspeed info0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w....................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science............ deepspeed wheel compiled w.torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... ..................[OKAY] .................. [OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name................................op name ................installedinstalled................ installed installed.... compatible....compatible --------------------------------------------------compatible-------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES]cpu_adam ...... ......cpu_adam............... [OKAY]...............[OKAY][YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .......fused_adam....... [OKAY].............[OKAY] fused_adam [NO]fused_lamb.............fused_lamb .......[NO].......................... [OKAY].......[NO][NO] [OKAY]..............fused_lamb [OKAY][OKAY]............. fused_lamb [NO] .................... [NO][OKAY] ....... [OKAY]sparse_attn sparse_attn............ ............[NO] .......[NO] [OKAY]....... sparse_attn[OKAY] transformer ............ ............transformersparse_attn[NO] [NO]............................... .......[NO][OKAY] [NO] .......[OKAY]....... transformer[OKAY][OKAY] ............stochastic_transformer transformerstochastic_transformer [NO] ............. . .......[NO][NO] [OKAY][NO].............. [OKAY].......[OKAY] [OKAY] stochastic_transformer stochastic_transformer. [NO]. .......[NO] [OKAY] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op name................................op name installedinstalled................................ ..installed.. compatibleinstalledcompatible.. --------------------------------------------------..--------------------------------------------------compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES][YES] cpu_adam........................... [YES][OKAY]............... [OKAY] ......[YES] [OKAY]...... [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY].............fused_adam fused_adam [NO] ............. fused_lamb.................... .............[NO][NO][OKAY] .......[NO]....... fused_lamb[OKAY].......[OKAY] .............[OKAY] fused_lamb[NO] fused_lamb ................................. [NO][NO][OKAY] ..............sparse_attn [OKAY][OKAY]............ [NO] ....... [OKAY] sparse_attntransformer ........................ [NO][NO] sparse_attn .......sparse_attn ....... ............ [OKAY] ............[NO][OKAY] [NO]transformer....... ...................[OKAY]stochastic_transformer [NO][OKAY] . .......transformer[NO] [OKAY]transformer....... ............ ............[OKAY]stochastic_transformer[NO] [NO]....... . ....... [OKAY] [NO] [OKAY] .......stochastic_transformer [OKAY] stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install pathtorch install path ............................................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch versiontorch version.................... ........................................1.8.1 1.8.11.8.1 torch cuda version torch cuda version...............torch cuda version 11.1.............................. nvcc version11.111.1 .....................nvcc versionnvcc version 11.2.......................................... deepspeed install path11.211.2 ...........deepspeed install pathdeepspeed install path ......................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info...................deepspeed info ...................0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.......deepspeed wheel compiled w. ......torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................................... [OKAY][OKAY][OKAY] [OKAY]------------------------------------------------------------------------------------------------------------------------------------------------------ op nameop nameop name-------------------------------------------------- ................................................ op nameinstalledinstalledinstalled .................... .. compatible compatibleinstalledcompatible-------------------------------------------------- ..---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam......cpu_adam ...............[OKAY]cpu_adam ............... [YES]............... [YES] ...... [YES] ...... [OKAY] ......fused_adam [OKAY] .............[OKAY] [NO] ....... [OKAY] fused_adamfused_lamb fused_adam ............. ............. ............. fused_adam[NO] [NO] [NO] .................... ....... ....... [OKAY] [NO] [OKAY][OKAY] ....... fused_lamb[OKAY]fused_lamb .......................... [NO][NO] fused_lamb....... .......sparse_attn.............[OKAY] [OKAY] ............[NO] [NO]....... ....... [OKAY][OKAY] transformersparse_attnsparse_attn .................................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY]sparse_attn ............ transformerstochastic_transformer [NO]transformer ............ .................... [NO] [NO][OKAY]....... [NO] ....... ....... [OKAY] [OKAY] transformer[OKAY] ............ stochastic_transformer[NO] stochastic_transformer ......... [OKAY][NO][NO] .............. [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-27 03:54:33,898] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.303 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 20.521 seconds time to initialize megatron (seconds): -17.161 [after megatron is initialized] datetime: 2021-09-27 03:54:54 building GPT model ... [2021-09-27 03:54:54,901] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-27 03:54:54,903] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-27 03:54:54,903] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.85 GB, percent = 19.7% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=0, data=8, model=0): 32, ProcessCoord(pipe=0, data=8, model=1): 33, ProcessCoord(pipe=0, data=8, model=2): 34, ProcessCoord(pipe=0, data=8, model=3): 35, ProcessCoord(pipe=0, data=9, model=0): 36, ProcessCoord(pipe=0, data=9, model=1): 37, ProcessCoord(pipe=0, data=9, model=2): 38, ProcessCoord(pipe=0, data=9, model=3): 39, ProcessCoord(pipe=0, data=10, model=0): 40, ProcessCoord(pipe=0, data=10, model=1): 41, ProcessCoord(pipe=0, data=10, model=2): 42, ProcessCoord(pipe=0, data=10, model=3): 43, ProcessCoord(pipe=0, data=11, model=0): 44, ProcessCoord(pipe=0, data=11, model=1): 45, ProcessCoord(pipe=0, data=11, model=2): 46, ProcessCoord(pipe=0, data=11, model=3): 47, ProcessCoord(pipe=0, data=12, model=0): 48, ProcessCoord(pipe=0, data=12, model=1): 49, ProcessCoord(pipe=0, data=12, model=2): 50, ProcessCoord(pipe=0, data=12, model=3): 51, ProcessCoord(pipe=0, data=13, model=0): 52, ProcessCoord(pipe=0, data=13, model=1): 53, ProcessCoord(pipe=0, data=13, model=2): 54, ProcessCoord(pipe=0, data=13, model=3): 55, ProcessCoord(pipe=0, data=14, model=0): 56, ProcessCoord(pipe=0, data=14, model=1): 57, ProcessCoord(pipe=0, data=14, model=2): 58, ProcessCoord(pipe=0, data=14, model=3): 59, ProcessCoord(pipe=0, data=15, model=0): 60, ProcessCoord(pipe=0, data=15, model=1): 61, ProcessCoord(pipe=0, data=15, model=2): 62, ProcessCoord(pipe=0, data=15, model=3): 63, ProcessCoord(pipe=1, data=0, model=0): 64, ProcessCoord(pipe=1, data=0, model=1): 65, ProcessCoord(pipe=1, data=0, model=2): 66, ProcessCoord(pipe=1, data=0, model=3): 67, ProcessCoord(pipe=1, data=1, model=0): 68, ProcessCoord(pipe=1, data=1, model=1): 69, ProcessCoord(pipe=1, data=1, model=2): 70, ProcessCoord(pipe=1, data=1, model=3): 71, ProcessCoord(pipe=1, data=2, model=0): 72, ProcessCoord(pipe=1, data=2, model=1): 73, ProcessCoord(pipe=1, data=2, model=2): 74, ProcessCoord(pipe=1, data=2, model=3): 75, ProcessCoord(pipe=1, data=3, model=0): 76, ProcessCoord(pipe=1, data=3, model=1): 77, ProcessCoord(pipe=1, data=3, model=2): 78, ProcessCoord(pipe=1, data=3, model=3): 79, ProcessCoord(pipe=1, data=4, model=0): 80, ProcessCoord(pipe=1, data=4, model=1): 81, ProcessCoord(pipe=1, data=4, model=2): 82, ProcessCoord(pipe=1, data=4, model=3): 83, ProcessCoord(pipe=1, data=5, model=0): 84, ProcessCoord(pipe=1, data=5, model=1): 85, ProcessCoord(pipe=1, data=5, model=2): 86, ProcessCoord(pipe=1, data=5, model=3): 87, ProcessCoord(pipe=1, data=6, model=0): 88, ProcessCoord(pipe=1, data=6, model=1): 89, ProcessCoord(pipe=1, data=6, model=2): 90, ProcessCoord(pipe=1, data=6, model=3): 91, ProcessCoord(pipe=1, data=7, model=0): 92, ProcessCoord(pipe=1, data=7, model=1): 93, ProcessCoord(pipe=1, data=7, model=2): 94, ProcessCoord(pipe=1, data=7, model=3): 95, ProcessCoord(pipe=1, data=8, model=0): 96, ProcessCoord(pipe=1, data=8, model=1): 97, ProcessCoord(pipe=1, data=8, model=2): 98, ProcessCoord(pipe=1, data=8, model=3): 99, ProcessCoord(pipe=1, data=9, model=0): 100, ProcessCoord(pipe=1, data=9, model=1): 101, ProcessCoord(pipe=1, data=9, model=2): 102, ProcessCoord(pipe=1, data=9, model=3): 103, ProcessCoord(pipe=1, data=10, model=0): 104, ProcessCoord(pipe=1, data=10, model=1): 105, ProcessCoord(pipe=1, data=10, model=2): 106, ProcessCoord(pipe=1, data=10, model=3): 107, ProcessCoord(pipe=1, data=11, model=0): 108, ProcessCoord(pipe=1, data=11, model=1): 109, ProcessCoord(pipe=1, data=11, model=2): 110, ProcessCoord(pipe=1, data=11, model=3): 111, ProcessCoord(pipe=1, data=12, model=0): 112, ProcessCoord(pipe=1, data=12, model=1): 113, ProcessCoord(pipe=1, data=12, model=2): 114, ProcessCoord(pipe=1, data=12, model=3): 115, ProcessCoord(pipe=1, data=13, model=0): 116, ProcessCoord(pipe=1, data=13, model=1): 117, ProcessCoord(pipe=1, data=13, model=2): 118, ProcessCoord(pipe=1, data=13, model=3): 119, ProcessCoord(pipe=1, data=14, model=0): 120, ProcessCoord(pipe=1, data=14, model=1): 121, ProcessCoord(pipe=1, data=14, model=2): 122, ProcessCoord(pipe=1, data=14, model=3): 123, ProcessCoord(pipe=1, data=15, model=0): 124, ProcessCoord(pipe=1, data=15, model=1): 125, ProcessCoord(pipe=1, data=15, model=2): 126, ProcessCoord(pipe=1, data=15, model=3): 127, ProcessCoord(pipe=2, data=0, model=0): 128, ProcessCoord(pipe=2, data=0, model=1): 129, ProcessCoord(pipe=2, data=0, model=2): 130, ProcessCoord(pipe=2, data=0, model=3): 131, ProcessCoord(pipe=2, data=1, model=0): 132, ProcessCoord(pipe=2, data=1, model=1): 133, ProcessCoord(pipe=2, data=1, model=2): 134, ProcessCoord(pipe=2, data=1, model=3): 135, ProcessCoord(pipe=2, data=2, model=0): 136, ProcessCoord(pipe=2, data=2, model=1): 137, ProcessCoord(pipe=2, data=2, model=2): 138, ProcessCoord(pipe=2, data=2, model=3): 139, ProcessCoord(pipe=2, data=3, model=0): 140, ProcessCoord(pipe=2, data=3, model=1): 141, ProcessCoord(pipe=2, data=3, model=2): 142, ProcessCoord(pipe=2, data=3, model=3): 143, ProcessCoord(pipe=2, data=4, model=0): 144, ProcessCoord(pipe=2, data=4, model=1): 145, ProcessCoord(pipe=2, data=4, model=2): 146, ProcessCoord(pipe=2, data=4, model=3): 147, ProcessCoord(pipe=2, data=5, model=0): 148, ProcessCoord(pipe=2, data=5, model=1): 149, ProcessCoord(pipe=2, data=5, model=2): 150, ProcessCoord(pipe=2, data=5, model=3): 151, ProcessCoord(pipe=2, data=6, model=0): 152, ProcessCoord(pipe=2, data=6, model=1): 153, ProcessCoord(pipe=2, data=6, model=2): 154, ProcessCoord(pipe=2, data=6, model=3): 155, ProcessCoord(pipe=2, data=7, model=0): 156, ProcessCoord(pipe=2, data=7, model=1): 157, ProcessCoord(pipe=2, data=7, model=2): 158, ProcessCoord(pipe=2, data=7, model=3): 159, ProcessCoord(pipe=2, data=8, model=0): 160, ProcessCoord(pipe=2, data=8, model=1): 161, ProcessCoord(pipe=2, data=8, model=2): 162, ProcessCoord(pipe=2, data=8, model=3): 163, ProcessCoord(pipe=2, data=9, model=0): 164, ProcessCoord(pipe=2, data=9, model=1): 165, ProcessCoord(pipe=2, data=9, model=2): 166, ProcessCoord(pipe=2, data=9, model=3): 167, ProcessCoord(pipe=2, data=10, model=0): 168, ProcessCoord(pipe=2, data=10, model=1): 169, ProcessCoord(pipe=2, data=10, model=2): 170, ProcessCoord(pipe=2, data=10, model=3): 171, ProcessCoord(pipe=2, data=11, model=0): 172, ProcessCoord(pipe=2, data=11, model=1): 173, ProcessCoord(pipe=2, data=11, model=2): 174, ProcessCoord(pipe=2, data=11, model=3): 175, ProcessCoord(pipe=2, data=12, model=0): 176, ProcessCoord(pipe=2, data=12, model=1): 177, ProcessCoord(pipe=2, data=12, model=2): 178, ProcessCoord(pipe=2, data=12, model=3): 179, ProcessCoord(pipe=2, data=13, model=0): 180, ProcessCoord(pipe=2, data=13, model=1): 181, ProcessCoord(pipe=2, data=13, model=2): 182, ProcessCoord(pipe=2, data=13, model=3): 183, ProcessCoord(pipe=2, data=14, model=0): 184, ProcessCoord(pipe=2, data=14, model=1): 185, ProcessCoord(pipe=2, data=14, model=2): 186, ProcessCoord(pipe=2, data=14, model=3): 187, ProcessCoord(pipe=2, data=15, model=0): 188, ProcessCoord(pipe=2, data=15, model=1): 189, ProcessCoord(pipe=2, data=15, model=2): 190, ProcessCoord(pipe=2, data=15, model=3): 191, ProcessCoord(pipe=3, data=0, model=0): 192, ProcessCoord(pipe=3, data=0, model=1): 193, ProcessCoord(pipe=3, data=0, model=2): 194, ProcessCoord(pipe=3, data=0, model=3): 195, ProcessCoord(pipe=3, data=1, model=0): 196, ProcessCoord(pipe=3, data=1, model=1): 197, ProcessCoord(pipe=3, data=1, model=2): 198, ProcessCoord(pipe=3, data=1, model=3): 199, ProcessCoord(pipe=3, data=2, model=0): 200, ProcessCoord(pipe=3, data=2, model=1): 201, ProcessCoord(pipe=3, data=2, model=2): 202, ProcessCoord(pipe=3, data=2, model=3): 203, ProcessCoord(pipe=3, data=3, model=0): 204, ProcessCoord(pipe=3, data=3, model=1): 205, ProcessCoord(pipe=3, data=3, model=2): 206, ProcessCoord(pipe=3, data=3, model=3): 207, ProcessCoord(pipe=3, data=4, model=0): 208, ProcessCoord(pipe=3, data=4, model=1): 209, ProcessCoord(pipe=3, data=4, model=2): 210, ProcessCoord(pipe=3, data=4, model=3): 211, ProcessCoord(pipe=3, data=5, model=0): 212, ProcessCoord(pipe=3, data=5, model=1): 213, ProcessCoord(pipe=3, data=5, model=2): 214, ProcessCoord(pipe=3, data=5, model=3): 215, ProcessCoord(pipe=3, data=6, model=0): 216, ProcessCoord(pipe=3, data=6, model=1): 217, ProcessCoord(pipe=3, data=6, model=2): 218, ProcessCoord(pipe=3, data=6, model=3): 219, ProcessCoord(pipe=3, data=7, model=0): 220, ProcessCoord(pipe=3, data=7, model=1): 221, ProcessCoord(pipe=3, data=7, model=2): 222, ProcessCoord(pipe=3, data=7, model=3): 223, ProcessCoord(pipe=3, data=8, model=0): 224, ProcessCoord(pipe=3, data=8, model=1): 225, ProcessCoord(pipe=3, data=8, model=2): 226, ProcessCoord(pipe=3, data=8, model=3): 227, ProcessCoord(pipe=3, data=9, model=0): 228, ProcessCoord(pipe=3, data=9, model=1): 229, ProcessCoord(pipe=3, data=9, model=2): 230, ProcessCoord(pipe=3, data=9, model=3): 231, ProcessCoord(pipe=3, data=10, model=0): 232, ProcessCoord(pipe=3, data=10, model=1): 233, ProcessCoord(pipe=3, data=10, model=2): 234, ProcessCoord(pipe=3, data=10, model=3): 235, ProcessCoord(pipe=3, data=11, model=0): 236, ProcessCoord(pipe=3, data=11, model=1): 237, ProcessCoord(pipe=3, data=11, model=2): 238, ProcessCoord(pipe=3, data=11, model=3): 239, ProcessCoord(pipe=3, data=12, model=0): 240, ProcessCoord(pipe=3, data=12, model=1): 241, ProcessCoord(pipe=3, data=12, model=2): 242, ProcessCoord(pipe=3, data=12, model=3): 243, ProcessCoord(pipe=3, data=13, model=0): 244, ProcessCoord(pipe=3, data=13, model=1): 245, ProcessCoord(pipe=3, data=13, model=2): 246, ProcessCoord(pipe=3, data=13, model=3): 247, ProcessCoord(pipe=3, data=14, model=0): 248, ProcessCoord(pipe=3, data=14, model=1): 249, ProcessCoord(pipe=3, data=14, model=2): 250, ProcessCoord(pipe=3, data=14, model=3): 251, ProcessCoord(pipe=3, data=15, model=0): 252, ProcessCoord(pipe=3, data=15, model=1): 253, ProcessCoord(pipe=3, data=15, model=2): 254, ProcessCoord(pipe=3, data=15, model=3): 255, ProcessCoord(pipe=4, data=0, model=0): 256, ProcessCoord(pipe=4, data=0, model=1): 257, ProcessCoord(pipe=4, data=0, model=2): 258, ProcessCoord(pipe=4, data=0, model=3): 259, ProcessCoord(pipe=4, data=1, model=0): 260, ProcessCoord(pipe=4, data=1, model=1): 261, ProcessCoord(pipe=4, data=1, model=2): 262, ProcessCoord(pipe=4, data=1, model=3): 263, ProcessCoord(pipe=4, data=2, model=0): 264, ProcessCoord(pipe=4, data=2, model=1): 265, ProcessCoord(pipe=4, data=2, model=2): 266, ProcessCoord(pipe=4, data=2, model=3): 267, ProcessCoord(pipe=4, data=3, model=0): 268, ProcessCoord(pipe=4, data=3, model=1): 269, ProcessCoord(pipe=4, data=3, model=2): 270, ProcessCoord(pipe=4, data=3, model=3): 271, ProcessCoord(pipe=4, data=4, model=0): 272, ProcessCoord(pipe=4, data=4, model=1): 273, ProcessCoord(pipe=4, data=4, model=2): 274, ProcessCoord(pipe=4, data=4, model=3): 275, ProcessCoord(pipe=4, data=5, model=0): 276, ProcessCoord(pipe=4, data=5, model=1): 277, ProcessCoord(pipe=4, data=5, model=2): 278, ProcessCoord(pipe=4, data=5, model=3): 279, ProcessCoord(pipe=4, data=6, model=0): 280, ProcessCoord(pipe=4, data=6, model=1): 281, ProcessCoord(pipe=4, data=6, model=2): 282, ProcessCoord(pipe=4, data=6, model=3): 283, ProcessCoord(pipe=4, data=7, model=0): 284, ProcessCoord(pipe=4, data=7, model=1): 285, ProcessCoord(pipe=4, data=7, model=2): 286, ProcessCoord(pipe=4, data=7, model=3): 287, ProcessCoord(pipe=4, data=8, model=0): 288, ProcessCoord(pipe=4, data=8, model=1): 289, ProcessCoord(pipe=4, data=8, model=2): 290, ProcessCoord(pipe=4, data=8, model=3): 291, ProcessCoord(pipe=4, data=9, model=0): 292, ProcessCoord(pipe=4, data=9, model=1): 293, ProcessCoord(pipe=4, data=9, model=2): 294, ProcessCoord(pipe=4, data=9, model=3): 295, ProcessCoord(pipe=4, data=10, model=0): 296, ProcessCoord(pipe=4, data=10, model=1): 297, ProcessCoord(pipe=4, data=10, model=2): 298, ProcessCoord(pipe=4, data=10, model=3): 299, ProcessCoord(pipe=4, data=11, model=0): 300, ProcessCoord(pipe=4, data=11, model=1): 301, ProcessCoord(pipe=4, data=11, model=2): 302, ProcessCoord(pipe=4, data=11, model=3): 303, ProcessCoord(pipe=4, data=12, model=0): 304, ProcessCoord(pipe=4, data=12, model=1): 305, ProcessCoord(pipe=4, data=12, model=2): 306, ProcessCoord(pipe=4, data=12, model=3): 307, ProcessCoord(pipe=4, data=13, model=0): 308, ProcessCoord(pipe=4, data=13, model=1): 309, ProcessCoord(pipe=4, data=13, model=2): 310, ProcessCoord(pipe=4, data=13, model=3): 311, ProcessCoord(pipe=4, data=14, model=0): 312, ProcessCoord(pipe=4, data=14, model=1): 313, ProcessCoord(pipe=4, data=14, model=2): 314, ProcessCoord(pipe=4, data=14, model=3): 315, ProcessCoord(pipe=4, data=15, model=0): 316, ProcessCoord(pipe=4, data=15, model=1): 317, ProcessCoord(pipe=4, data=15, model=2): 318, ProcessCoord(pipe=4, data=15, model=3): 319, ProcessCoord(pipe=5, data=0, model=0): 320, ProcessCoord(pipe=5, data=0, model=1): 321, ProcessCoord(pipe=5, data=0, model=2): 322, ProcessCoord(pipe=5, data=0, model=3): 323, ProcessCoord(pipe=5, data=1, model=0): 324, ProcessCoord(pipe=5, data=1, model=1): 325, ProcessCoord(pipe=5, data=1, model=2): 326, ProcessCoord(pipe=5, data=1, model=3): 327, ProcessCoord(pipe=5, data=2, model=0): 328, ProcessCoord(pipe=5, data=2, model=1): 329, ProcessCoord(pipe=5, data=2, model=2): 330, ProcessCoord(pipe=5, data=2, model=3): 331, ProcessCoord(pipe=5, data=3, model=0): 332, ProcessCoord(pipe=5, data=3, model=1): 333, ProcessCoord(pipe=5, data=3, model=2): 334, ProcessCoord(pipe=5, data=3, model=3): 335, ProcessCoord(pipe=5, data=4, model=0): 336, ProcessCoord(pipe=5, data=4, model=1): 337, ProcessCoord(pipe=5, data=4, model=2): 338, ProcessCoord(pipe=5, data=4, model=3): 339, ProcessCoord(pipe=5, data=5, model=0): 340, ProcessCoord(pipe=5, data=5, model=1): 341, ProcessCoord(pipe=5, data=5, model=2): 342, ProcessCoord(pipe=5, data=5, model=3): 343, ProcessCoord(pipe=5, data=6, model=0): 344, ProcessCoord(pipe=5, data=6, model=1): 345, ProcessCoord(pipe=5, data=6, model=2): 346, ProcessCoord(pipe=5, data=6, model=3): 347, ProcessCoord(pipe=5, data=7, model=0): 348, ProcessCoord(pipe=5, data=7, model=1): 349, ProcessCoord(pipe=5, data=7, model=2): 350, ProcessCoord(pipe=5, data=7, model=3): 351, ProcessCoord(pipe=5, data=8, model=0): 352, ProcessCoord(pipe=5, data=8, model=1): 353, ProcessCoord(pipe=5, data=8, model=2): 354, ProcessCoord(pipe=5, data=8, model=3): 355, ProcessCoord(pipe=5, data=9, model=0): 356, ProcessCoord(pipe=5, data=9, model=1): 357, ProcessCoord(pipe=5, data=9, model=2): 358, ProcessCoord(pipe=5, data=9, model=3): 359, ProcessCoord(pipe=5, data=10, model=0): 360, ProcessCoord(pipe=5, data=10, model=1): 361, ProcessCoord(pipe=5, data=10, model=2): 362, ProcessCoord(pipe=5, data=10, model=3): 363, ProcessCoord(pipe=5, data=11, model=0): 364, ProcessCoord(pipe=5, data=11, model=1): 365, ProcessCoord(pipe=5, data=11, model=2): 366, ProcessCoord(pipe=5, data=11, model=3): 367, ProcessCoord(pipe=5, data=12, model=0): 368, ProcessCoord(pipe=5, data=12, model=1): 369, ProcessCoord(pipe=5, data=12, model=2): 370, ProcessCoord(pipe=5, data=12, model=3): 371, ProcessCoord(pipe=5, data=13, model=0): 372, ProcessCoord(pipe=5, data=13, model=1): 373, ProcessCoord(pipe=5, data=13, model=2): 374, ProcessCoord(pipe=5, data=13, model=3): 375, ProcessCoord(pipe=5, data=14, model=0): 376, ProcessCoord(pipe=5, data=14, model=1): 377, ProcessCoord(pipe=5, data=14, model=2): 378, ProcessCoord(pipe=5, data=14, model=3): 379, ProcessCoord(pipe=5, data=15, model=0): 380, ProcessCoord(pipe=5, data=15, model=1): 381, ProcessCoord(pipe=5, data=15, model=2): 382, ProcessCoord(pipe=5, data=15, model=3): 383, ProcessCoord(pipe=6, data=0, model=0): 384, ProcessCoord(pipe=6, data=0, model=1): 385, ProcessCoord(pipe=6, data=0, model=2): 386, ProcessCoord(pipe=6, data=0, model=3): 387, ProcessCoord(pipe=6, data=1, model=0): 388, ProcessCoord(pipe=6, data=1, model=1): 389, ProcessCoord(pipe=6, data=1, model=2): 390, ProcessCoord(pipe=6, data=1, model=3): 391, ProcessCoord(pipe=6, data=2, model=0): 392, ProcessCoord(pipe=6, data=2, model=1): 393, ProcessCoord(pipe=6, data=2, model=2): 394, ProcessCoord(pipe=6, data=2, model=3): 395, ProcessCoord(pipe=6, data=3, model=0): 396, ProcessCoord(pipe=6, data=3, model=1): 397, ProcessCoord(pipe=6, data=3, model=2): 398, ProcessCoord(pipe=6, data=3, model=3): 399, ProcessCoord(pipe=6, data=4, model=0): 400, ProcessCoord(pipe=6, data=4, model=1): 401, ProcessCoord(pipe=6, data=4, model=2): 402, ProcessCoord(pipe=6, data=4, model=3): 403, ProcessCoord(pipe=6, data=5, model=0): 404, ProcessCoord(pipe=6, data=5, model=1): 405, ProcessCoord(pipe=6, data=5, model=2): 406, ProcessCoord(pipe=6, data=5, model=3): 407, ProcessCoord(pipe=6, data=6, model=0): 408, ProcessCoord(pipe=6, data=6, model=1): 409, ProcessCoord(pipe=6, data=6, model=2): 410, ProcessCoord(pipe=6, data=6, model=3): 411, ProcessCoord(pipe=6, data=7, model=0): 412, ProcessCoord(pipe=6, data=7, model=1): 413, ProcessCoord(pipe=6, data=7, model=2): 414, ProcessCoord(pipe=6, data=7, model=3): 415, ProcessCoord(pipe=6, data=8, model=0): 416, ProcessCoord(pipe=6, data=8, model=1): 417, ProcessCoord(pipe=6, data=8, model=2): 418, ProcessCoord(pipe=6, data=8, model=3): 419, ProcessCoord(pipe=6, data=9, model=0): 420, ProcessCoord(pipe=6, data=9, model=1): 421, ProcessCoord(pipe=6, data=9, model=2): 422, ProcessCoord(pipe=6, data=9, model=3): 423, ProcessCoord(pipe=6, data=10, model=0): 424, ProcessCoord(pipe=6, data=10, model=1): 425, ProcessCoord(pipe=6, data=10, model=2): 426, ProcessCoord(pipe=6, data=10, model=3): 427, ProcessCoord(pipe=6, data=11, model=0): 428, ProcessCoord(pipe=6, data=11, model=1): 429, ProcessCoord(pipe=6, data=11, model=2): 430, ProcessCoord(pipe=6, data=11, model=3): 431, ProcessCoord(pipe=6, data=12, model=0): 432, ProcessCoord(pipe=6, data=12, model=1): 433, ProcessCoord(pipe=6, data=12, model=2): 434, ProcessCoord(pipe=6, data=12, model=3): 435, ProcessCoord(pipe=6, data=13, model=0): 436, ProcessCoord(pipe=6, data=13, model=1): 437, ProcessCoord(pipe=6, data=13, model=2): 438, ProcessCoord(pipe=6, data=13, model=3): 439, ProcessCoord(pipe=6, data=14, model=0): 440, ProcessCoord(pipe=6, data=14, model=1): 441, ProcessCoord(pipe=6, data=14, model=2): 442, ProcessCoord(pipe=6, data=14, model=3): 443, ProcessCoord(pipe=6, data=15, model=0): 444, ProcessCoord(pipe=6, data=15, model=1): 445, ProcessCoord(pipe=6, data=15, model=2): 446, ProcessCoord(pipe=6, data=15, model=3): 447, ProcessCoord(pipe=7, data=0, model=0): 448, ProcessCoord(pipe=7, data=0, model=1): 449, ProcessCoord(pipe=7, data=0, model=2): 450, ProcessCoord(pipe=7, data=0, model=3): 451, ProcessCoord(pipe=7, data=1, model=0): 452, ProcessCoord(pipe=7, data=1, model=1): 453, ProcessCoord(pipe=7, data=1, model=2): 454, ProcessCoord(pipe=7, data=1, model=3): 455, ProcessCoord(pipe=7, data=2, model=0): 456, ProcessCoord(pipe=7, data=2, model=1): 457, ProcessCoord(pipe=7, data=2, model=2): 458, ProcessCoord(pipe=7, data=2, model=3): 459, ProcessCoord(pipe=7, data=3, model=0): 460, ProcessCoord(pipe=7, data=3, model=1): 461, ProcessCoord(pipe=7, data=3, model=2): 462, ProcessCoord(pipe=7, data=3, model=3): 463, ProcessCoord(pipe=7, data=4, model=0): 464, ProcessCoord(pipe=7, data=4, model=1): 465, ProcessCoord(pipe=7, data=4, model=2): 466, ProcessCoord(pipe=7, data=4, model=3): 467, ProcessCoord(pipe=7, data=5, model=0): 468, ProcessCoord(pipe=7, data=5, model=1): 469, ProcessCoord(pipe=7, data=5, model=2): 470, ProcessCoord(pipe=7, data=5, model=3): 471, ProcessCoord(pipe=7, data=6, model=0): 472, ProcessCoord(pipe=7, data=6, model=1): 473, ProcessCoord(pipe=7, data=6, model=2): 474, ProcessCoord(pipe=7, data=6, model=3): 475, ProcessCoord(pipe=7, data=7, model=0): 476, ProcessCoord(pipe=7, data=7, model=1): 477, ProcessCoord(pipe=7, data=7, model=2): 478, ProcessCoord(pipe=7, data=7, model=3): 479, ProcessCoord(pipe=7, data=8, model=0): 480, ProcessCoord(pipe=7, data=8, model=1): 481, ProcessCoord(pipe=7, data=8, model=2): 482, ProcessCoord(pipe=7, data=8, model=3): 483, ProcessCoord(pipe=7, data=9, model=0): 484, ProcessCoord(pipe=7, data=9, model=1): 485, ProcessCoord(pipe=7, data=9, model=2): 486, ProcessCoord(pipe=7, data=9, model=3): 487, ProcessCoord(pipe=7, data=10, model=0): 488, ProcessCoord(pipe=7, data=10, model=1): 489, ProcessCoord(pipe=7, data=10, model=2): 490, ProcessCoord(pipe=7, data=10, model=3): 491, ProcessCoord(pipe=7, data=11, model=0): 492, ProcessCoord(pipe=7, data=11, model=1): 493, ProcessCoord(pipe=7, data=11, model=2): 494, ProcessCoord(pipe=7, data=11, model=3): 495, ProcessCoord(pipe=7, data=12, model=0): 496, ProcessCoord(pipe=7, data=12, model=1): 497, ProcessCoord(pipe=7, data=12, model=2): 498, ProcessCoord(pipe=7, data=12, model=3): 499, ProcessCoord(pipe=7, data=13, model=0): 500, ProcessCoord(pipe=7, data=13, model=1): 501, ProcessCoord(pipe=7, data=13, model=2): 502, ProcessCoord(pipe=7, data=13, model=3): 503, ProcessCoord(pipe=7, data=14, model=0): 504, ProcessCoord(pipe=7, data=14, model=1): 505, ProcessCoord(pipe=7, data=14, model=2): 506, ProcessCoord(pipe=7, data=14, model=3): 507, ProcessCoord(pipe=7, data=15, model=0): 508, ProcessCoord(pipe=7, data=15, model=1): 509, ProcessCoord(pipe=7, data=15, model=2): 510, ProcessCoord(pipe=7, data=15, model=3): 511} [2021-09-27 03:54:57,678] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 [2021-09-27 03:54:59,504] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-27 03:54:59,505] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-27 03:54:59,505] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.03 GB, percent = 19.8% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-27 03:54:59,644] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-27 03:54:59,753] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-27 03:54:59,753] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-27 03:54:59,753] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-27 03:54:59,753] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-27 03:54:59,753] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-27 03:54:59,753] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-27 03:54:59,753] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-27 03:54:59,753] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-27 03:54:59,753] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-27 03:54:59,753] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-27 03:55:04,471] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-27 03:55:04,471] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-27 03:55:04,471] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-27 03:55:04,471] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-27 03:55:04,471] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-27 03:55:04,471] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] amp_params ................... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] dump_state ................... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] gradient_accumulation_steps .. 128 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] pld_params ................... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] world_size ................... 16 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-27 03:55:04,473] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-27 03:55:04,474] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=128 micro_batch_size=1 [2021-09-27 03:55:04,910] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,910] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,910] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=259 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=256 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=258 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=257 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=130 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=129 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=131 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=128 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=384 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=385 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=386 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=387 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=194 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=195 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=193 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=192 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=449 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=448 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=451 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=321 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=320 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=322 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=323 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=66 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=67 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=64 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=450 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=65 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 384 successfully loaded 8 ZeRO state_dicts for rank 424 successfully loaded 8 ZeRO state_dicts for rank 444 successfully loaded 8 ZeRO state_dicts for rank 400 successfully loaded 8 ZeRO state_dicts for rank 261 successfully loaded 8 ZeRO state_dicts for rank 432 successfully loaded 8 ZeRO state_dicts for rank 420 successfully loaded 8 ZeRO state_dicts for rank 152 successfully loaded 8 ZeRO state_dicts for rank 440 successfully loaded 8 ZeRO state_dicts for rank 387 successfully loaded 8 ZeRO state_dicts for rank 296 successfully loaded 8 ZeRO state_dicts for rank 392 successfully loaded 8 ZeRO state_dicts for rank 196 successfully loaded 8 ZeRO state_dicts for rank 338 successfully loaded 8 ZeRO state_dicts for rank 379 successfully loaded 8 ZeRO state_dicts for rank 336 loading 8 zero partition checkpoints for rank 384 successfully loaded 8 ZeRO state_dicts for rank 385 successfully loaded 8 ZeRO state_dicts for rank 445 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 428 successfully loaded 8 ZeRO state_dicts for rank 337 successfully loaded 8 ZeRO state_dicts for rank 416 successfully loaded 8 ZeRO state_dicts for rank 436 loading 8 zero partition checkpoints for rank 424 successfully loaded 8 ZeRO state_dicts for rank 88 loading 8 zero partition checkpoints for rank 444 successfully loaded 8 ZeRO state_dicts for rank 376 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 388 successfully loaded 8 ZeRO state_dicts for rank 238 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 174 loading 8 zero partition checkpoints for rank 261 successfully loaded 8 ZeRO state_dicts for rank 250 loading 8 zero partition checkpoints for rank 400 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 277 successfully loaded 8 ZeRO state_dicts for rank 437 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 297 loading 8 zero partition checkpoints for rank 432 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 194 successfully loaded 8 ZeRO state_dicts for rank 199 successfully loaded 8 ZeRO state_dicts for rank 382 successfully loaded 8 ZeRO state_dicts for rank 332 successfully loaded 8 ZeRO state_dicts for rank 245 successfully loaded 8 ZeRO state_dicts for rank 441 successfully loaded 8 ZeRO state_dicts for rank 299 successfully loaded 8 ZeRO state_dicts for rank 242 successfully loaded 8 ZeRO state_dicts for rank 391 loading 8 zero partition checkpoints for rank 420 successfully loaded 8 ZeRO state_dicts for rank 234 successfully loaded 8 ZeRO state_dicts for rank 380 successfully loaded 8 ZeRO state_dicts for rank 433 successfully loaded 8 ZeRO state_dicts for rank 423 successfully loaded 8 ZeRO state_dicts for rank 425 loading 8 zero partition checkpoints for rank 440 loading 8 zero partition checkpoints for rank 152 successfully loaded 8 ZeRO state_dicts for rank 232 successfully loaded 8 ZeRO state_dicts for rank 246 successfully loaded 8 ZeRO state_dicts for rank 401 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 241 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 394 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 257 successfully loaded 8 ZeRO state_dicts for rank 429 successfully loaded 8 ZeRO state_dicts for rank 422 successfully loaded 8 ZeRO state_dicts for rank 265 successfully loaded 8 ZeRO state_dicts for rank 340 successfully loaded 8 ZeRO state_dicts for rank 256 successfully loaded 8 ZeRO state_dicts for rank 229 successfully loaded 8 ZeRO state_dicts for rank 218 loading 8 zero partition checkpoints for rank 387 successfully loaded 8 ZeRO state_dicts for rank 421 successfully loaded 8 ZeRO state_dicts for rank 121 successfully loaded 8 ZeRO state_dicts for rank 153 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 447 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 237 successfully loaded 8 ZeRO state_dicts for rank 403 successfully loaded 8 ZeRO state_dicts for rank 378 successfully loaded 8 ZeRO state_dicts for rank 341 successfully loaded 8 ZeRO state_dicts for rank 389 successfully loaded 8 ZeRO state_dicts for rank 367 successfully loaded 8 ZeRO state_dicts for rank 236 successfully loaded 8 ZeRO state_dicts for rank 292 successfully loaded 8 ZeRO state_dicts for rank 298 successfully loaded 8 ZeRO state_dicts for rank 393 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 383 successfully loaded 8 ZeRO state_dicts for rank 446 successfully loaded 8 ZeRO state_dicts for rank 366 successfully loaded 8 ZeRO state_dicts for rank 443 loading 8 zero partition checkpoints for rank 392 successfully loaded 8 ZeRO state_dicts for rank 278 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 386 successfully loaded 8 ZeRO state_dicts for rank 408 loading 8 zero partition checkpoints for rank 338 successfully loaded 8 ZeRO state_dicts for rank 109 loading 8 zero partition checkpoints for rank 196 successfully loaded 8 ZeRO state_dicts for rank 154 successfully loaded 8 ZeRO state_dicts for rank 430 successfully loaded 8 ZeRO state_dicts for rank 342 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 339 successfully loaded 8 ZeRO state_dicts for rank 233 successfully loaded 8 ZeRO state_dicts for rank 235 successfully loaded 8 ZeRO state_dicts for rank 279 successfully loaded 8 ZeRO state_dicts for rank 285 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 249 successfully loaded 8 ZeRO state_dicts for rank 343 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 450 successfully loaded 8 ZeRO state_dicts for rank 313 successfully loaded 8 ZeRO state_dicts for rank 293 successfully loaded 8 ZeRO state_dicts for rank 381 successfully loaded 8 ZeRO state_dicts for rank 364 successfully loaded 8 ZeRO state_dicts for rank 251 successfully loaded 8 ZeRO state_dicts for rank 65 loading 8 zero partition checkpoints for rank 379 successfully loaded 8 ZeRO state_dicts for rank 266 successfully loaded 8 ZeRO state_dicts for rank 365 loading 8 zero partition checkpoints for rank 296 successfully loaded 8 ZeRO state_dicts for rank 442 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 431 successfully loaded 8 ZeRO state_dicts for rank 276 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 435 successfully loaded 8 ZeRO state_dicts for rank 309 loading 8 zero partition checkpoints for rank 336 successfully loaded 8 ZeRO state_dicts for rank 335 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 412 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 438 successfully loaded 8 ZeRO state_dicts for rank 426 successfully loaded 8 ZeRO state_dicts for rank 317 successfully loaded 8 ZeRO state_dicts for rank 176 successfully loaded 8 ZeRO state_dicts for rank 260 successfully loaded 8 ZeRO state_dicts for rank 240 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 120 successfully loaded 8 ZeRO state_dicts for rank 354 successfully loaded 8 ZeRO state_dicts for rank 239 successfully loaded 8 ZeRO state_dicts for rank 228 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 289 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 247 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 439 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 385 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 396 successfully loaded 8 ZeRO state_dicts for rank 355 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 164 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 177 successfully loaded 8 ZeRO state_dicts for rank 312 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 252 successfully loaded 8 ZeRO state_dicts for rank 369 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 351 successfully loaded 8 ZeRO state_dicts for rank 167 loading 8 zero partition checkpoints for rank 337 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 334 successfully loaded 8 ZeRO state_dicts for rank 390 successfully loaded 8 ZeRO state_dicts for rank 427 successfully loaded 8 ZeRO state_dicts for rank 122 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 97 loading 8 zero partition checkpoints for rank 436 successfully loaded 8 ZeRO state_dicts for rank 263 successfully loaded 8 ZeRO state_dicts for rank 142 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 377 successfully loaded 8 ZeRO state_dicts for rank 352 loading 8 zero partition checkpoints for rank 376 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 231 successfully loaded 8 ZeRO state_dicts for rank 291 successfully loaded 8 ZeRO state_dicts for rank 77 loading 8 zero partition checkpoints for rank 445 loading 8 zero partition checkpoints for rank 428 successfully loaded 8 ZeRO state_dicts for rank 290 loading 8 zero partition checkpoints for rank 416 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 414 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 262 successfully loaded 8 ZeRO state_dicts for rank 468 successfully loaded 8 ZeRO state_dicts for rank 395 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 388 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 253 loading 8 zero partition checkpoints for rank 88 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 205 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 404 successfully loaded 8 ZeRO state_dicts for rank 417 successfully loaded 8 ZeRO state_dicts for rank 130 successfully loaded 8 ZeRO state_dicts for rank 288 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 179 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 402 successfully loaded 8 ZeRO state_dicts for rank 349 successfully loaded 8 ZeRO state_dicts for rank 188 successfully loaded 8 ZeRO state_dicts for rank 410 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 101 successfully loaded 8 ZeRO state_dicts for rank 398 successfully loaded 8 ZeRO state_dicts for rank 281 successfully loaded 8 ZeRO state_dicts for rank 254 successfully loaded 8 ZeRO state_dicts for rank 474 successfully loaded 8 ZeRO state_dicts for rank 333 successfully loaded 8 ZeRO state_dicts for rank 358 successfully loaded 8 ZeRO state_dicts for rank 363 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 471 successfully loaded 8 ZeRO state_dicts for rank 453 successfully loaded 8 ZeRO state_dicts for rank 345 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 85 successfully loaded 8 ZeRO state_dicts for rank 434 successfully loaded 8 ZeRO state_dicts for rank 267 successfully loaded 8 ZeRO state_dicts for rank 230 loading 8 zero partition checkpoints for rank 197 successfully loaded 8 ZeRO state_dicts for rank 295 successfully loaded 8 ZeRO state_dicts for rank 353 loading 8 zero partition checkpoints for rank 437 successfully loaded 8 ZeRO state_dicts for rank 273 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 36 successfully loaded 8 ZeRO state_dicts for rank 470 successfully loaded 8 ZeRO state_dicts for rank 357 successfully loaded 8 ZeRO state_dicts for rank 151 successfully loaded 8 ZeRO state_dicts for rank 301 successfully loaded 8 ZeRO state_dicts for rank 315 loading 8 zero partition checkpoints for rank 174 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 160 loading 8 zero partition checkpoints for rank 125 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 104 loading 8 zero partition checkpoints for rank 248 successfully loaded 8 ZeRO state_dicts for rank 370 successfully loaded 8 ZeRO state_dicts for rank 311 successfully loaded 8 ZeRO state_dicts for rank 11 successfully loaded 8 ZeRO state_dicts for rank 478 successfully loaded 8 ZeRO state_dicts for rank 227 successfully loaded 8 ZeRO state_dicts for rank 183 successfully loaded 8 ZeRO state_dicts for rank 272 successfully loaded 8 ZeRO state_dicts for rank 255 loading 8 zero partition checkpoints for rank 194 successfully loaded 8 ZeRO state_dicts for rank 9 loading 8 zero partition checkpoints for rank 204 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 399 successfully loaded 8 ZeRO state_dicts for rank 451 successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 200 successfully loaded 8 ZeRO state_dicts for rank 316 loading 8 zero partition checkpoints for rank 158 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 73 loading 8 zero partition checkpoints for rank 441 successfully loaded 8 ZeRO state_dicts for rank 418 successfully loaded 8 ZeRO state_dicts for rank 448 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 356 successfully loaded 8 ZeRO state_dicts for rank 269 loading 8 zero partition checkpoints for rank 299 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 361 loading 8 zero partition checkpoints for rank 277 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 350 loading 8 zero partition checkpoints for rank 391 loading 8 zero partition checkpoints for rank 297 successfully loaded 8 ZeRO state_dicts for rank 107 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 242 successfully loaded 8 ZeRO state_dicts for rank 318 successfully loaded 8 ZeRO state_dicts for rank 373 successfully loaded 8 ZeRO state_dicts for rank 475 successfully loaded 8 ZeRO state_dicts for rank 103 successfully loaded 8 ZeRO state_dicts for rank 472 successfully loaded 8 ZeRO state_dicts for rank 221 successfully loaded 8 ZeRO state_dicts for rank 210 loading 8 zero partition checkpoints for rank 192 successfully loaded 8 ZeRO state_dicts for rank 368 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 268 successfully loaded 8 ZeRO state_dicts for rank 456 successfully loaded 8 ZeRO state_dicts for rank 455 successfully loaded 8 ZeRO state_dicts for rank 321 successfully loaded 8 ZeRO state_dicts for rank 462 successfully loaded 8 ZeRO state_dicts for rank 284 successfully loaded 8 ZeRO state_dicts for rank 117 successfully loaded 8 ZeRO state_dicts for rank 41 loading 8 zero partition checkpoints for rank 394 successfully loaded 8 ZeRO state_dicts for rank 359 successfully loaded 8 ZeRO state_dicts for rank 375 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 181 loading 8 zero partition checkpoints for rank 423 successfully loaded 8 ZeRO state_dicts for rank 10 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 191 loading 8 zero partition checkpoints for rank 178 successfully loaded 8 ZeRO state_dicts for rank 294 loading 8 zero partition checkpoints for rank 332 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 371 loading 8 zero partition checkpoints for rank 401 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 324 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 422 loading 8 zero partition checkpoints for rank 199 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 322 successfully loaded 8 ZeRO state_dicts for rank 258 successfully loaded 8 ZeRO state_dicts for rank 329 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 460 loading 8 zero partition checkpoints for rank 380 loading 8 zero partition checkpoints for rank 421 successfully loaded 8 ZeRO state_dicts for rank 323 loading 8 zero partition checkpoints for rank 256 loading 8 zero partition checkpoints for rank 433 loading 8 zero partition checkpoints for rank 229 successfully loaded 8 ZeRO state_dicts for rank 302 loading 8 zero partition checkpoints for rank 265 successfully loaded 8 ZeRO state_dicts for rank 74 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 225 loading 8 zero partition checkpoints for rank 153 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 138 successfully loaded 8 ZeRO state_dicts for rank 190 loading 8 zero partition checkpoints for rank 246 successfully loaded 8 ZeRO state_dicts for rank 118 successfully loaded 8 ZeRO state_dicts for rank 406 successfully loaded 8 ZeRO state_dicts for rank 413 successfully loaded 8 ZeRO state_dicts for rank 397 successfully loaded 8 ZeRO state_dicts for rank 264 loading 8 zero partition checkpoints for rank 429 successfully loaded 8 ZeRO state_dicts for rank 275 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 403 loading 8 zero partition checkpoints for rank 378 loading 8 zero partition checkpoints for rank 232 successfully loaded 8 ZeRO state_dicts for rank 71 loading 8 zero partition checkpoints for rank 257 loading 8 zero partition checkpoints for rank 389 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 66 successfully loaded 8 ZeRO state_dicts for rank 213 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 304 successfully loaded 8 ZeRO state_dicts for rank 211 loading 8 zero partition checkpoints for rank 393 successfully loaded 8 ZeRO state_dicts for rank 347 loading 8 zero partition checkpoints for rank 443 loading 8 zero partition checkpoints for rank 386 successfully loaded 8 ZeRO state_dicts for rank 314 successfully loaded 8 ZeRO state_dicts for rank 208 successfully loaded 8 ZeRO state_dicts for rank 459 successfully loaded 8 ZeRO state_dicts for rank 165 successfully loaded 8 ZeRO state_dicts for rank 419 loading 8 zero partition checkpoints for rank 278 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 362 loading 8 zero partition checkpoints for rank 367 loading 8 zero partition checkpoints for rank 180 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 303 successfully loaded 8 ZeRO state_dicts for rank 374 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 339 successfully loaded 8 ZeRO state_dicts for rank 274 loading 8 zero partition checkpoints for rank 292 loading 8 zero partition checkpoints for rank 128 successfully loaded 8 ZeRO state_dicts for rank 114 loading 8 zero partition checkpoints for rank 206 successfully loaded 8 ZeRO state_dicts for rank 372 successfully loaded 8 ZeRO state_dicts for rank 449 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 409 loading 8 zero partition checkpoints for rank 69 loading 8 zero partition checkpoints for rank 298 loading 8 zero partition checkpoints for rank 123 successfully loaded 8 ZeRO state_dicts for rank 3 loading 8 zero partition checkpoints for rank 366 loading 8 zero partition checkpoints for rank 279 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 259 successfully loaded 8 ZeRO state_dicts for rank 479 loading 8 zero partition checkpoints for rank 235 successfully loaded 8 ZeRO state_dicts for rank 67 loading 8 zero partition checkpoints for rank 447 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 270 loading 8 zero partition checkpoints for rank 96 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 75 successfully loaded 8 ZeRO state_dicts for rank 466 successfully loaded 8 ZeRO state_dicts for rank 226 loading 8 zero partition checkpoints for rank 216 successfully loaded 8 ZeRO state_dicts for rank 224 successfully loaded 8 ZeRO state_dicts for rank 280 loading 8 zero partition checkpoints for rank 285 loading 8 zero partition checkpoints for rank 341 successfully loaded 8 ZeRO state_dicts for rank 92 loading 8 zero partition checkpoints for rank 251 successfully loaded 8 ZeRO state_dicts for rank 29 successfully loaded 8 ZeRO state_dicts for rank 411 successfully loaded 8 ZeRO state_dicts for rank 507 loading 8 zero partition checkpoints for rank 408 successfully loaded 8 ZeRO state_dicts for rank 171 loading 8 zero partition checkpoints for rank 446 successfully loaded 8 ZeRO state_dicts for rank 146 loading 8 zero partition checkpoints for rank 340 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 430 successfully loaded 8 ZeRO state_dicts for rank 327 successfully loaded 8 ZeRO state_dicts for rank 331 loading 8 zero partition checkpoints for rank 381 loading 8 zero partition checkpoints for rank 364 successfully loaded 8 ZeRO state_dicts for rank 20 loading 8 zero partition checkpoints for rank 132 successfully loaded 8 ZeRO state_dicts for rank 282 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 243 successfully loaded 8 ZeRO state_dicts for rank 452 loading 8 zero partition checkpoints for rank 431 successfully loaded 8 ZeRO state_dicts for rank 305 successfully loaded 8 ZeRO state_dicts for rank 21 successfully loaded 8 ZeRO state_dicts for rank 169 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 442 successfully loaded 8 ZeRO state_dicts for rank 330 loading 8 zero partition checkpoints for rank 124 successfully loaded 8 ZeRO state_dicts for rank 286 loading 8 zero partition checkpoints for rank 175 successfully loaded 8 ZeRO state_dicts for rank 326 successfully loaded 8 ZeRO state_dicts for rank 454 loading 8 zero partition checkpoints for rank 155 successfully loaded 8 ZeRO state_dicts for rank 476 successfully loaded 8 ZeRO state_dicts for rank 102 successfully loaded 8 ZeRO state_dicts for rank 300 loading 8 zero partition checkpoints for rank 250 successfully loaded 8 ZeRO state_dicts for rank 1 loading 8 zero partition checkpoints for rank 435 successfully loaded 8 ZeRO state_dicts for rank 15 successfully loaded 8 ZeRO state_dicts for rank 38 successfully loaded 8 ZeRO state_dicts for rank 328 successfully loaded 8 ZeRO state_dicts for rank 0 loading 8 zero partition checkpoints for rank 172 successfully loaded 8 ZeRO state_dicts for rank 463 loading 8 zero partition checkpoints for rank 219 successfully loaded 8 ZeRO state_dicts for rank 320 loading 8 zero partition checkpoints for rank 218 successfully loaded 8 ZeRO state_dicts for rank 56 successfully loaded 8 ZeRO state_dicts for rank 271 loading 8 zero partition checkpoints for rank 150 successfully loaded 8 ZeRO state_dicts for rank 287 loading 8 zero partition checkpoints for rank 309 successfully loaded 8 ZeRO state_dicts for rank 19 successfully loaded 8 ZeRO state_dicts for rank 24 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 415 successfully loaded 8 ZeRO state_dicts for rank 310 loading 8 zero partition checkpoints for rank 365 loading 8 zero partition checkpoints for rank 240 successfully loaded 8 ZeRO state_dicts for rank 78 loading 8 zero partition checkpoints for rank 260 loading 8 zero partition checkpoints for rank 342 loading 8 zero partition checkpoints for rank 313 loading 8 zero partition checkpoints for rank 438 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 308 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 344 loading 8 zero partition checkpoints for rank 289 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 57 successfully loaded 8 ZeRO state_dicts for rank 484 successfully loaded 8 ZeRO state_dicts for rank 487 loading 8 zero partition checkpoints for rank 252 successfully loaded 8 ZeRO state_dicts for rank 348 loading 8 zero partition checkpoints for rank 239 successfully loaded 8 ZeRO state_dicts for rank 25 loading 8 zero partition checkpoints for rank 120 loading 8 zero partition checkpoints for rank 276 loading 8 zero partition checkpoints for rank 425 loading 8 zero partition checkpoints for rank 382 loading 8 zero partition checkpoints for rank 173 successfully loaded 8 ZeRO state_dicts for rank 494 successfully loaded 8 ZeRO state_dicts for rank 162 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 504 successfully loaded 8 ZeRO state_dicts for rank 407 successfully loaded 8 ZeRO state_dicts for rank 13 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 390 successfully loaded 8 ZeRO state_dicts for rank 319 loading 8 zero partition checkpoints for rank 377 loading 8 zero partition checkpoints for rank 64 loading 8 zero partition checkpoints for rank 351 successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 306 loading 8 zero partition checkpoints for rank 412 loading 8 zero partition checkpoints for rank 195 loading 8 zero partition checkpoints for rank 369 loading 8 zero partition checkpoints for rank 439 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 343 successfully loaded 8 ZeRO state_dicts for rank 405 successfully loaded 8 ZeRO state_dicts for rank 106 loading 8 zero partition checkpoints for rank 154 loading 8 zero partition checkpoints for rank 396 loading 8 zero partition checkpoints for rank 167 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 352 loading 8 zero partition checkpoints for rank 238 successfully loaded 8 ZeRO state_dicts for rank 283 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 137 successfully loaded 8 ZeRO state_dicts for rank 346 successfully loaded 8 ZeRO state_dicts for rank 170 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 193 loading 8 zero partition checkpoints for rank 141 successfully loaded 8 ZeRO state_dicts for rank 33 loading 8 zero partition checkpoints for rank 263 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 473 successfully loaded 8 ZeRO state_dicts for rank 98 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 136 successfully loaded 8 ZeRO state_dicts for rank 360 loading 8 zero partition checkpoints for rank 177 successfully loaded 8 ZeRO state_dicts for rank 510 loading 8 zero partition checkpoints for rank 249 successfully loaded 8 ZeRO state_dicts for rank 490 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 325 successfully loaded 8 ZeRO state_dicts for rank 133 loading 8 zero partition checkpoints for rank 262 successfully loaded 8 ZeRO state_dicts for rank 506 loading 8 zero partition checkpoints for rank 395 successfully loaded 8 ZeRO state_dicts for rank 147 loading 8 zero partition checkpoints for rank 99 loading 8 zero partition checkpoints for rank 312 loading 8 zero partition checkpoints for rank 63 successfully loaded 8 ZeRO state_dicts for rank 458 loading 8 zero partition checkpoints for rank 127 successfully loaded 8 ZeRO state_dicts for rank 119 successfully loaded 8 ZeRO state_dicts for rank 134 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 205 loading 8 zero partition checkpoints for rank 212 loading 8 zero partition checkpoints for rank 288 loading 8 zero partition checkpoints for rank 335 loading 8 zero partition checkpoints for rank 383 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 166 successfully loaded 8 ZeRO state_dicts for rank 51 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 410 successfully loaded 8 ZeRO state_dicts for rank 55 loading 8 zero partition checkpoints for rank 148 loading 8 zero partition checkpoints for rank 417 loading 8 zero partition checkpoints for rank 105 successfully loaded 8 ZeRO state_dicts for rank 457 successfully loaded 8 ZeRO state_dicts for rank 2 successfully loaded 8 ZeRO state_dicts for rank 42 loading 8 zero partition checkpoints for rank 402 successfully loaded 8 ZeRO state_dicts for rank 500 loading 8 zero partition checkpoints for rank 434 successfully loaded 8 ZeRO state_dicts for rank 94 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 358 successfully loaded 8 ZeRO state_dicts for rank 491 loading 8 zero partition checkpoints for rank 61 successfully loaded 8 ZeRO state_dicts for rank 16 successfully loaded 8 ZeRO state_dicts for rank 58 loading 8 zero partition checkpoints for rank 65 loading 8 zero partition checkpoints for rank 349 loading 8 zero partition checkpoints for rank 295 successfully loaded 8 ZeRO state_dicts for rank 17 successfully loaded 8 ZeRO state_dicts for rank 464 loading 8 zero partition checkpoints for rank 404 loading 8 zero partition checkpoints for rank 363 successfully loaded 8 ZeRO state_dicts for rank 499 successfully loaded 8 ZeRO state_dicts for rank 461 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 477 successfully loaded 8 ZeRO state_dicts for rank 45 loading 8 zero partition checkpoints for rank 159 successfully loaded 8 ZeRO state_dicts for rank 30 loading 8 zero partition checkpoints for rank 345 successfully loaded 8 ZeRO state_dicts for rank 53 loading 8 zero partition checkpoints for rank 353 successfully loaded 8 ZeRO state_dicts for rank 23 loading 8 zero partition checkpoints for rank 334 loading 8 zero partition checkpoints for rank 315 loading 8 zero partition checkpoints for rank 333 loading 8 zero partition checkpoints for rank 273 loading 8 zero partition checkpoints for rank 8 successfully loaded 8 ZeRO state_dicts for rank 503 successfully loaded 8 ZeRO state_dicts for rank 12 loading 8 zero partition checkpoints for rank 244 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 370 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 31 loading 8 zero partition checkpoints for rank 311 loading 8 zero partition checkpoints for rank 426 successfully loaded 8 ZeRO state_dicts for rank 486 loading 8 zero partition checkpoints for rank 399 successfully loaded 8 ZeRO state_dicts for rank 26 loading 8 zero partition checkpoints for rank 474 loading 8 zero partition checkpoints for rank 200 successfully loaded 8 ZeRO state_dicts for rank 54 loading 8 zero partition checkpoints for rank 101 successfully loaded 8 ZeRO state_dicts for rank 46 loading 8 zero partition checkpoints for rank 139 successfully loaded 8 ZeRO state_dicts for rank 498 successfully loaded 8 ZeRO state_dicts for rank 307 loading 8 zero partition checkpoints for rank 471 successfully loaded 8 ZeRO state_dicts for rank 469 successfully loaded 8 ZeRO state_dicts for rank 495 successfully loaded 8 ZeRO state_dicts for rank 22 loading 8 zero partition checkpoints for rank 104 loading 8 zero partition checkpoints for rank 272 successfully loaded 8 ZeRO state_dicts for rank 28 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 160 loading 8 zero partition checkpoints for rank 354 loading 8 zero partition checkpoints for rank 267 successfully loaded 8 ZeRO state_dicts for rank 467 loading 8 zero partition checkpoints for rank 317 loading 8 zero partition checkpoints for rank 361 loading 8 zero partition checkpoints for rank 281 loading 8 zero partition checkpoints for rank 73 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 107 successfully loaded 8 ZeRO state_dicts for rank 34 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 140 successfully loaded 8 ZeRO state_dicts for rank 14 loading 8 zero partition checkpoints for rank 255 successfully loaded 8 ZeRO state_dicts for rank 482 loading 8 zero partition checkpoints for rank 293 loading 8 zero partition checkpoints for rank 220 loading 8 zero partition checkpoints for rank 368 loading 8 zero partition checkpoints for rank 201 successfully loaded 8 ZeRO state_dicts for rank 483 loading 8 zero partition checkpoints for rank 269 loading 8 zero partition checkpoints for rank 355 loading 8 zero partition checkpoints for rank 168 loading 8 zero partition checkpoints for rank 427 loading 8 zero partition checkpoints for rank 318 loading 8 zero partition checkpoints for rank 284 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 93 loading 8 zero partition checkpoints for rank 418 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 359 loading 8 zero partition checkpoints for rank 291 loading 8 zero partition checkpoints for rank 207 loading 8 zero partition checkpoints for rank 268 loading 8 zero partition checkpoints for rank 316 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 371 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 97 successfully loaded 8 ZeRO state_dicts for rank 502 loading 8 zero partition checkpoints for rank 398 loading 8 zero partition checkpoints for rank 156 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 508 loading 8 zero partition checkpoints for rank 215 loading 8 zero partition checkpoints for rank 290 successfully loaded 8 ZeRO state_dicts for rank 497 successfully loaded 8 ZeRO state_dicts for rank 496 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 138 successfully loaded 8 ZeRO state_dicts for rank 493 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 209 successfully loaded 8 ZeRO state_dicts for rank 488 successfully loaded 8 ZeRO state_dicts for rank 485 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 157 successfully loaded 8 ZeRO state_dicts for rank 489 successfully loaded 8 ZeRO state_dicts for rank 501 loading 8 zero partition checkpoints for rank 176 loading 8 zero partition checkpoints for rank 468 loading 8 zero partition checkpoints for rank 143 loading 8 zero partition checkpoints for rank 100 loading 8 zero partition checkpoints for rank 223 loading 8 zero partition checkpoints for rank 87 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 258 successfully loaded 8 ZeRO state_dicts for rank 480 loading 8 zero partition checkpoints for rank 406 loading 8 zero partition checkpoints for rank 183 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 275 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 329 successfully loaded 8 ZeRO state_dicts for rank 511 loading 8 zero partition checkpoints for rank 72 successfully loaded 8 ZeRO state_dicts for rank 492 loading 8 zero partition checkpoints for rank 211 loading 8 zero partition checkpoints for rank 357 loading 8 zero partition checkpoints for rank 321 loading 8 zero partition checkpoints for rank 322 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 113 loading 8 zero partition checkpoints for rank 142 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 478 loading 8 zero partition checkpoints for rank 460 successfully loaded 8 ZeRO state_dicts for rank 509 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 419 successfully loaded 8 ZeRO state_dicts for rank 505 loading 8 zero partition checkpoints for rank 266 loading 8 zero partition checkpoints for rank 413 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 470 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 11 loading 8 zero partition checkpoints for rank 184 loading 8 zero partition checkpoints for rank 274 loading 8 zero partition checkpoints for rank 324 loading 8 zero partition checkpoints for rank 314 loading 8 zero partition checkpoints for rank 362 loading 8 zero partition checkpoints for rank 294 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 409 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 450 loading 8 zero partition checkpoints for rank 448 loading 8 zero partition checkpoints for rank 259 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 270 loading 8 zero partition checkpoints for rank 356 loading 8 zero partition checkpoints for rank 165 successfully loaded 8 ZeRO state_dicts for rank 465 loading 8 zero partition checkpoints for rank 214 loading 8 zero partition checkpoints for rank 221 loading 8 zero partition checkpoints for rank 83 loading 8 zero partition checkpoints for rank 76 loading 8 zero partition checkpoints for rank 414 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 114 successfully loaded 8 ZeRO state_dicts for rank 110 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 116 loading 8 zero partition checkpoints for rank 10 loading 8 zero partition checkpoints for rank 411 successfully loaded 8 ZeRO state_dicts for rank 481 loading 8 zero partition checkpoints for rank 331 loading 8 zero partition checkpoints for rank 397 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 326 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 286 loading 8 zero partition checkpoints for rank 75 loading 8 zero partition checkpoints for rank 287 loading 8 zero partition checkpoints for rank 453 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 280 loading 8 zero partition checkpoints for rank 305 loading 8 zero partition checkpoints for rank 271 loading 8 zero partition checkpoints for rank 62 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 144 loading 8 zero partition checkpoints for rank 282 loading 8 zero partition checkpoints for rank 310 loading 8 zero partition checkpoints for rank 456 loading 8 zero partition checkpoints for rank 308 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 472 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 350 loading 8 zero partition checkpoints for rank 372 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 130 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 60 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 374 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 407 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 67 loading 8 zero partition checkpoints for rank 171 loading 8 zero partition checkpoints for rank 80 loading 8 zero partition checkpoints for rank 449 loading 8 zero partition checkpoints for rank 106 loading 8 zero partition checkpoints for rank 81 loading 8 zero partition checkpoints for rank 347 loading 8 zero partition checkpoints for rank 479 loading 8 zero partition checkpoints for rank 405 loading 8 zero partition checkpoints for rank 346 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 283 loading 8 zero partition checkpoints for rank 264 loading 8 zero partition checkpoints for rank 415 loading 8 zero partition checkpoints for rank 68 loading 8 zero partition checkpoints for rank 475 loading 8 zero partition checkpoints for rank 40 loading 8 zero partition checkpoints for rank 145 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 459 loading 8 zero partition checkpoints for rank 134 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 56 loading 8 zero partition checkpoints for rank 109 loading 8 zero partition checkpoints for rank 303 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 115 loading 8 zero partition checkpoints for rank 319 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 476 loading 8 zero partition checkpoints for rank 375 loading 8 zero partition checkpoints for rank 348 loading 8 zero partition checkpoints for rank 57 loading 8 zero partition checkpoints for rank 360 loading 8 zero partition checkpoints for rank 33 loading 8 zero partition checkpoints for rank 15 loading 8 zero partition checkpoints for rank 328 loading 8 zero partition checkpoints for rank 330 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 323 loading 8 zero partition checkpoints for rank 327 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 487 loading 8 zero partition checkpoints for rank 112 loading 8 zero partition checkpoints for rank 373 loading 8 zero partition checkpoints for rank 506 loading 8 zero partition checkpoints for rank 504 loading 8 zero partition checkpoints for rank 48 loading 8 zero partition checkpoints for rank 510 loading 8 zero partition checkpoints for rank 301 loading 8 zero partition checkpoints for rank 344 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 300 loading 8 zero partition checkpoints for rank 320 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 462 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 304 loading 8 zero partition checkpoints for rank 500 loading 8 zero partition checkpoints for rank 473 loading 8 zero partition checkpoints for rank 461 loading 8 zero partition checkpoints for rank 307 loading 8 zero partition checkpoints for rank 491 loading 8 zero partition checkpoints for rank 451 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 325 loading 8 zero partition checkpoints for rank 507 loading 8 zero partition checkpoints for rank 23 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 32 loading 8 zero partition checkpoints for rank 52 loading 8 zero partition checkpoints for rank 30 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 477 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 58 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 452 loading 8 zero partition checkpoints for rank 302 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 108 loading 8 zero partition checkpoints for rank 469 loading 8 zero partition checkpoints for rank 169 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 306 loading 8 zero partition checkpoints for rank 490 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 464 loading 8 zero partition checkpoints for rank 457 loading 8 zero partition checkpoints for rank 463 loading 8 zero partition checkpoints for rank 458 loading 8 zero partition checkpoints for rank 170 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 147 loading 8 zero partition checkpoints for rank 488 loading 8 zero partition checkpoints for rank 18 loading 8 zero partition checkpoints for rank 485 loading 8 zero partition checkpoints for rank 111 loading 8 zero partition checkpoints for rank 501 loading 8 zero partition checkpoints for rank 493 loading 8 zero partition checkpoints for rank 455 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 467 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 492 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 454 loading 8 zero partition checkpoints for rank 227 loading 8 zero partition checkpoints for rank 19 loading 8 zero partition checkpoints for rank 509 loading 8 zero partition checkpoints for rank 489 loading 8 zero partition checkpoints for rank 0 loading 8 zero partition checkpoints for rank 226 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 466 loading 8 zero partition checkpoints for rank 499 loading 8 zero partition checkpoints for rank 484 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 135 loading 8 zero partition checkpoints for rank 50 loading 8 zero partition checkpoints for rank 110 loading 8 zero partition checkpoints for rank 505 loading 8 zero partition checkpoints for rank 497 loading 8 zero partition checkpoints for rank 496 loading 8 zero partition checkpoints for rank 498 loading 8 zero partition checkpoints for rank 494 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 486 loading 8 zero partition checkpoints for rank 16 loading 8 zero partition checkpoints for rank 495 loading 8 zero partition checkpoints for rank 503 loading 8 zero partition checkpoints for rank 465 loading 8 zero partition checkpoints for rank 502 loading 8 zero partition checkpoints for rank 508 loading 8 zero partition checkpoints for rank 511 loading 8 zero partition checkpoints for rank 480 successfully loaded 8 ZeRO state_dicts for rank 5 loading 8 zero partition checkpoints for rank 482 loading 8 zero partition checkpoints for rank 483 loading 8 zero partition checkpoints for rank 481 successfully loaded 8 ZeRO state_dicts for rank 6 successfully loaded 8 ZeRO state_dicts for rank 4 successfully loaded 8 ZeRO state_dicts for rank 7 loading 8 zero partition checkpoints for rank 5 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 6 loading 8 zero partition checkpoints for rank 7 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 9768 time (ms) | load-checkpoint: 91243.56 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-27 03:56:36 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.143013 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.289 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.388 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.061 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-27 03:56:43 done with setup ... training ... time (ms) | model-and-optimizer-setup: 102057.80 | train/valid/test-data-iterators-setup: 5731.66 [before the start of training step] datetime: 2021-09-27 03:56:43 [2021-09-27 03:56:43,457] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-27 03:56:43,457] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-27 03:56:43,457] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-27 03:56:43,457] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-27 03:56:43,457] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 192] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15132.0 | max reserved: 15132.0 [Rank 129] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15500.0 | max reserved: 15500.0 [Rank 130] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15364.0 | max reserved: 15364.0 [Rank 64] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15820.0 | max reserved: 15820.0 [Rank 0] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 18256.0 | max reserved: 18256.0 [Rank 2] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 17788.0 | max reserved: 17788.0 [Rank 256] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14812.0 | max reserved: 14812.0 [Rank 257] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14940.0 | max reserved: 14940.0 [Rank 193] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15096.0 | max reserved: 15096.0[Rank 194] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15112.0 | max reserved: 15112.0 [Rank 128] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15456.0 | max reserved: 15456.0 [Rank 385] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14312.0 | max reserved: 14312.0 [Rank 320] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14716.0 | max reserved: 14716.0 [Rank 65] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15632.0 | max reserved: 15632.0 [Rank 1] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 18256.0 | max reserved: 18256.0 [Rank 258] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14696.0 | max reserved: 14696.0 [Rank 131] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15532.0 | max reserved: 15532.0 [Rank 384] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14268.0 | max reserved: 14268.0 [Rank 449] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.337890625 | reserved: 15736.0 | max reserved: 15736.0[Rank 448] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.33642578125 | reserved: 15736.0 | max reserved: 15736.0 [Rank 322] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14616.0 | max reserved: 14616.0 [Rank 66] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15828.0 | max reserved: 15828.0 [Rank 3] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 18256.0 | max reserved: 18256.0 [Rank 259] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14712.0 | max reserved: 14712.0 [Rank 195] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15208.0 | max reserved: 15208.0 [Rank 387] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14312.0 | max reserved: 14312.0[Rank 386] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14312.0 | max reserved: 14312.0 [Rank 451] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.3369140625 | reserved: 15736.0 | max reserved: 15736.0 [Rank 323] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14648.0 | max reserved: 14648.0 [Rank 67] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15536.0 | max reserved: 15536.0 [Rank 450] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.33544921875 | reserved: 15736.0 | max reserved: 15736.0 [Rank 321] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14684.0 | max reserved: 14684.0 iteration 9770/ 159576 | consumed samples: 701760 | elapsed time per iteration (ms): 21146.4 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9780/ 159576 | consumed samples: 704160 | elapsed time per iteration (ms): 13340.2 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9790/ 159576 | consumed samples: 706560 | elapsed time per iteration (ms): 13419.1 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9800/ 159576 | consumed samples: 708976 | elapsed time per iteration (ms): 13591.3 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9810/ 159576 | consumed samples: 711536 | elapsed time per iteration (ms): 13986.8 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9820/ 159576 | consumed samples: 714096 | elapsed time per iteration (ms): 14105.8 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9830/ 159576 | consumed samples: 716656 | elapsed time per iteration (ms): 14030.2 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9840/ 159576 | consumed samples: 719216 | elapsed time per iteration (ms): 14188.9 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 04:14:28] PULSE: tr8-104B is running for 20:12 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 9850/ 159576 | consumed samples: 721776 | elapsed time per iteration (ms): 14071.1 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9860/ 159576 | consumed samples: 724336 | elapsed time per iteration (ms): 14125.1 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9870/ 159576 | consumed samples: 726896 | elapsed time per iteration (ms): 14170.2 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9880/ 159576 | consumed samples: 729456 | elapsed time per iteration (ms): 14139.5 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9890/ 159576 | consumed samples: 732016 | elapsed time per iteration (ms): 14156.0 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9900/ 159576 | consumed samples: 734576 | elapsed time per iteration (ms): 14057.9 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9910/ 159576 | consumed samples: 737136 | elapsed time per iteration (ms): 14129.8 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9920/ 159576 | consumed samples: 739696 | elapsed time per iteration (ms): 14157.7 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9930/ 159576 | consumed samples: 742256 | elapsed time per iteration (ms): 14024.1 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9940/ 159576 | consumed samples: 744816 | elapsed time per iteration (ms): 13971.4 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9950/ 159576 | consumed samples: 747376 | elapsed time per iteration (ms): 14101.5 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9960/ 159576 | consumed samples: 749936 | elapsed time per iteration (ms): 14210.0 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9970/ 159576 | consumed samples: 752496 | elapsed time per iteration (ms): 14219.6 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9980/ 159576 | consumed samples: 755056 | elapsed time per iteration (ms): 14117.6 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9990/ 159576 | consumed samples: 757712 | elapsed time per iteration (ms): 14400.0 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 04:51:19,357] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=1052, lr=[5.999919375575235e-05, 5.999919375575235e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 10000 loss: nan iter time (s): 0.007 samples/sec: 37472.688 iteration 10000/ 159576 | consumed samples: 760432 | elapsed time per iteration (ms): 14648.0 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 10000 | lm loss value: 7.270623E+00 | lm loss PPL: 1.437445E+03 | ------------------------------------------------------------------------------------------------- iteration 10010/ 159576 | consumed samples: 763152 | elapsed time per iteration (ms): 16469.3 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10020/ 159576 | consumed samples: 765872 | elapsed time per iteration (ms): 14573.2 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10030/ 159576 | consumed samples: 768592 | elapsed time per iteration (ms): 14611.8 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10040/ 159576 | consumed samples: 771312 | elapsed time per iteration (ms): 14782.8 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10050/ 159576 | consumed samples: 774032 | elapsed time per iteration (ms): 14722.8 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10060/ 159576 | consumed samples: 776752 | elapsed time per iteration (ms): 14595.9 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10070/ 159576 | consumed samples: 779472 | elapsed time per iteration (ms): 14712.5 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10080/ 159576 | consumed samples: 782192 | elapsed time per iteration (ms): 14640.3 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10090/ 159576 | consumed samples: 784912 | elapsed time per iteration (ms): 15060.9 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 05:14:32] PULSE: tr8-104B is running for 1:20:16 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 10100/ 159576 | consumed samples: 787632 | elapsed time per iteration (ms): 14624.0 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10110/ 159576 | consumed samples: 790352 | elapsed time per iteration (ms): 14621.7 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10120/ 159576 | consumed samples: 793072 | elapsed time per iteration (ms): 14685.1 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10130/ 159576 | consumed samples: 795792 | elapsed time per iteration (ms): 14531.8 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10140/ 159576 | consumed samples: 798512 | elapsed time per iteration (ms): 14629.6 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10150/ 159576 | consumed samples: 801232 | elapsed time per iteration (ms): 14771.8 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10160/ 159576 | consumed samples: 803984 | elapsed time per iteration (ms): 14889.9 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10170/ 159576 | consumed samples: 806864 | elapsed time per iteration (ms): 15471.9 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10180/ 159576 | consumed samples: 809744 | elapsed time per iteration (ms): 15228.6 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10190/ 159576 | consumed samples: 812624 | elapsed time per iteration (ms): 15425.1 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10200/ 159576 | consumed samples: 815504 | elapsed time per iteration (ms): 15390.8 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10210/ 159576 | consumed samples: 818384 | elapsed time per iteration (ms): 15293.9 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10220/ 159576 | consumed samples: 821264 | elapsed time per iteration (ms): 15259.9 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10230/ 159576 | consumed samples: 824144 | elapsed time per iteration (ms): 15547.4 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10240/ 159576 | consumed samples: 827024 | elapsed time per iteration (ms): 15375.5 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10250/ 159576 | consumed samples: 829904 | elapsed time per iteration (ms): 15322.8 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10260/ 159576 | consumed samples: 832784 | elapsed time per iteration (ms): 15280.3 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10270/ 159576 | consumed samples: 835664 | elapsed time per iteration (ms): 15390.4 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10280/ 159576 | consumed samples: 838544 | elapsed time per iteration (ms): 15339.6 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10290/ 159576 | consumed samples: 841424 | elapsed time per iteration (ms): 15252.5 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10300/ 159576 | consumed samples: 844304 | elapsed time per iteration (ms): 15146.5 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10310/ 159576 | consumed samples: 847184 | elapsed time per iteration (ms): 15389.7 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10320/ 159576 | consumed samples: 850064 | elapsed time per iteration (ms): 15348.5 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10330/ 159576 | consumed samples: 853072 | elapsed time per iteration (ms): 15779.0 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 06:14:35] PULSE: tr8-104B is running for 2:20:19 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 10340/ 159576 | consumed samples: 856112 | elapsed time per iteration (ms): 15864.8 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10350/ 159576 | consumed samples: 859152 | elapsed time per iteration (ms): 15831.6 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10360/ 159576 | consumed samples: 862192 | elapsed time per iteration (ms): 15954.9 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10370/ 159576 | consumed samples: 865232 | elapsed time per iteration (ms): 15871.6 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10380/ 159576 | consumed samples: 868272 | elapsed time per iteration (ms): 15850.1 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10390/ 159576 | consumed samples: 871312 | elapsed time per iteration (ms): 15796.9 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10400/ 159576 | consumed samples: 874352 | elapsed time per iteration (ms): 16082.6 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10410/ 159576 | consumed samples: 877392 | elapsed time per iteration (ms): 16036.3 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10420/ 159576 | consumed samples: 880432 | elapsed time per iteration (ms): 15898.1 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10430/ 159576 | consumed samples: 883472 | elapsed time per iteration (ms): 15687.4 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10440/ 159576 | consumed samples: 886512 | elapsed time per iteration (ms): 15579.4 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10450/ 159576 | consumed samples: 889552 | elapsed time per iteration (ms): 16071.4 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10460/ 159576 | consumed samples: 892592 | elapsed time per iteration (ms): 15986.9 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10470/ 159576 | consumed samples: 895632 | elapsed time per iteration (ms): 15775.6 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10480/ 159576 | consumed samples: 898720 | elapsed time per iteration (ms): 16164.1 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10490/ 159576 | consumed samples: 901920 | elapsed time per iteration (ms): 16520.7 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10500/ 159576 | consumed samples: 905120 | elapsed time per iteration (ms): 16597.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 10500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-27 06:59:42,258] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step10500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 10500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 21886.11 iteration 10510/ 159576 | consumed samples: 908320 | elapsed time per iteration (ms): 18676.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10520/ 159576 | consumed samples: 911520 | elapsed time per iteration (ms): 16429.2 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10530/ 159576 | consumed samples: 914720 | elapsed time per iteration (ms): 16551.8 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10540/ 159576 | consumed samples: 917920 | elapsed time per iteration (ms): 16488.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10550/ 159576 | consumed samples: 921120 | elapsed time per iteration (ms): 16385.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 07:14:45] PULSE: tr8-104B is running for 3:20:29 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 10560/ 159576 | consumed samples: 924320 | elapsed time per iteration (ms): 16352.3 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10570/ 159576 | consumed samples: 927520 | elapsed time per iteration (ms): 16281.1 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10580/ 159576 | consumed samples: 930720 | elapsed time per iteration (ms): 16433.2 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10590/ 159576 | consumed samples: 933920 | elapsed time per iteration (ms): 16276.4 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10600/ 159576 | consumed samples: 937120 | elapsed time per iteration (ms): 16510.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10610/ 159576 | consumed samples: 940320 | elapsed time per iteration (ms): 16415.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10620/ 159576 | consumed samples: 943520 | elapsed time per iteration (ms): 16211.4 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10630/ 159576 | consumed samples: 946800 | elapsed time per iteration (ms): 16664.6 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10640/ 159576 | consumed samples: 950160 | elapsed time per iteration (ms): 17041.3 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10650/ 159576 | consumed samples: 953520 | elapsed time per iteration (ms): 17363.3 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10660/ 159576 | consumed samples: 956880 | elapsed time per iteration (ms): 16944.5 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10670/ 159576 | consumed samples: 960240 | elapsed time per iteration (ms): 17142.6 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10680/ 159576 | consumed samples: 963600 | elapsed time per iteration (ms): 17139.9 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10690/ 159576 | consumed samples: 966960 | elapsed time per iteration (ms): 17104.6 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10700/ 159576 | consumed samples: 970320 | elapsed time per iteration (ms): 16968.9 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10710/ 159576 | consumed samples: 973680 | elapsed time per iteration (ms): 17071.1 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10720/ 159576 | consumed samples: 977040 | elapsed time per iteration (ms): 16939.7 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10730/ 159576 | consumed samples: 980400 | elapsed time per iteration (ms): 17182.0 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10740/ 159576 | consumed samples: 983760 | elapsed time per iteration (ms): 16947.4 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10750/ 159576 | consumed samples: 987120 | elapsed time per iteration (ms): 16887.4 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10760/ 159576 | consumed samples: 990480 | elapsed time per iteration (ms): 17060.2 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 08:14:50] PULSE: tr8-104B is running for 4:20:34 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 10770/ 159576 | consumed samples: 993920 | elapsed time per iteration (ms): 17207.0 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10780/ 159576 | consumed samples: 997440 | elapsed time per iteration (ms): 17439.0 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10790/ 159576 | consumed samples: 1000960 | elapsed time per iteration (ms): 17709.5 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10800/ 159576 | consumed samples: 1004480 | elapsed time per iteration (ms): 17397.4 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10810/ 159576 | consumed samples: 1008000 | elapsed time per iteration (ms): 17515.8 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10820/ 159576 | consumed samples: 1011520 | elapsed time per iteration (ms): 17500.0 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10830/ 159576 | consumed samples: 1015040 | elapsed time per iteration (ms): 17623.4 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10840/ 159576 | consumed samples: 1018560 | elapsed time per iteration (ms): 17764.6 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10850/ 159576 | consumed samples: 1022080 | elapsed time per iteration (ms): 17667.0 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10860/ 159576 | consumed samples: 1025600 | elapsed time per iteration (ms): 17590.6 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10870/ 159576 | consumed samples: 1029120 | elapsed time per iteration (ms): 17626.8 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10880/ 159576 | consumed samples: 1032640 | elapsed time per iteration (ms): 17668.3 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10890/ 159576 | consumed samples: 1036160 | elapsed time per iteration (ms): 17624.1 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10900/ 159576 | consumed samples: 1039680 | elapsed time per iteration (ms): 17793.8 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10910/ 159576 | consumed samples: 1043360 | elapsed time per iteration (ms): 18188.2 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10920/ 159576 | consumed samples: 1047040 | elapsed time per iteration (ms): 18317.3 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10930/ 159576 | consumed samples: 1050720 | elapsed time per iteration (ms): 18324.8 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10940/ 159576 | consumed samples: 1054400 | elapsed time per iteration (ms): 18321.8 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10950/ 159576 | consumed samples: 1058080 | elapsed time per iteration (ms): 18321.0 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10960/ 159576 | consumed samples: 1061760 | elapsed time per iteration (ms): 18223.5 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 09:14:51] PULSE: tr8-104B is running for 5:20:35 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 10970/ 159576 | consumed samples: 1065440 | elapsed time per iteration (ms): 18268.5 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10980/ 159576 | consumed samples: 1069120 | elapsed time per iteration (ms): 18399.6 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10990/ 159576 | consumed samples: 1072800 | elapsed time per iteration (ms): 18217.5 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11000/ 159576 | consumed samples: 1076480 | elapsed time per iteration (ms): 18260.1 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 11000 | lm loss value: 7.284734E+00 | lm loss PPL: 1.457873E+03 | ------------------------------------------------------------------------------------------------- iteration 11010/ 159576 | consumed samples: 1080160 | elapsed time per iteration (ms): 20666.6 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11020/ 159576 | consumed samples: 1083840 | elapsed time per iteration (ms): 18277.2 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11030/ 159576 | consumed samples: 1087552 | elapsed time per iteration (ms): 18419.3 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11040/ 159576 | consumed samples: 1091392 | elapsed time per iteration (ms): 19002.0 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11050/ 159576 | consumed samples: 1095232 | elapsed time per iteration (ms): 18930.9 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11060/ 159576 | consumed samples: 1099072 | elapsed time per iteration (ms): 18821.2 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11070/ 159576 | consumed samples: 1102912 | elapsed time per iteration (ms): 18889.6 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11080/ 159576 | consumed samples: 1106752 | elapsed time per iteration (ms): 18970.4 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11090/ 159576 | consumed samples: 1110592 | elapsed time per iteration (ms): 18822.6 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11100/ 159576 | consumed samples: 1114432 | elapsed time per iteration (ms): 18697.2 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11110/ 159576 | consumed samples: 1118272 | elapsed time per iteration (ms): 18737.4 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11120/ 159576 | consumed samples: 1122112 | elapsed time per iteration (ms): 18949.1 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11130/ 159576 | consumed samples: 1125952 | elapsed time per iteration (ms): 19003.8 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11140/ 159576 | consumed samples: 1129792 | elapsed time per iteration (ms): 18836.8 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11150/ 159576 | consumed samples: 1133632 | elapsed time per iteration (ms): 18941.7 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11160/ 159576 | consumed samples: 1137616 | elapsed time per iteration (ms): 19465.1 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 10:14:56] PULSE: tr8-104B is running for 6:20:40 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 11170/ 159576 | consumed samples: 1141616 | elapsed time per iteration (ms): 19493.8 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11180/ 159576 | consumed samples: 1145616 | elapsed time per iteration (ms): 19504.7 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11190/ 159576 | consumed samples: 1149616 | elapsed time per iteration (ms): 19555.2 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11200/ 159576 | consumed samples: 1153616 | elapsed time per iteration (ms): 19490.6 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11210/ 159576 | consumed samples: 1157616 | elapsed time per iteration (ms): 19532.7 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11220/ 159576 | consumed samples: 1161616 | elapsed time per iteration (ms): 19261.8 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11230/ 159576 | consumed samples: 1165616 | elapsed time per iteration (ms): 19376.4 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11240/ 159576 | consumed samples: 1169616 | elapsed time per iteration (ms): 19505.2 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11250/ 159576 | consumed samples: 1173616 | elapsed time per iteration (ms): 19535.4 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11260/ 159576 | consumed samples: 1177616 | elapsed time per iteration (ms): 19415.2 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11270/ 159576 | consumed samples: 1181632 | elapsed time per iteration (ms): 19446.5 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11280/ 159576 | consumed samples: 1185792 | elapsed time per iteration (ms): 20068.3 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11290/ 159576 | consumed samples: 1189952 | elapsed time per iteration (ms): 19947.1 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11300/ 159576 | consumed samples: 1194112 | elapsed time per iteration (ms): 20002.0 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11310/ 159576 | consumed samples: 1198272 | elapsed time per iteration (ms): 20006.4 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11320/ 159576 | consumed samples: 1202432 | elapsed time per iteration (ms): 20000.1 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11330/ 159576 | consumed samples: 1206592 | elapsed time per iteration (ms): 20065.5 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11340/ 159576 | consumed samples: 1210752 | elapsed time per iteration (ms): 19952.9 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 11:15:05] PULSE: tr8-104B is running for 7:20:49 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 11350/ 159576 | consumed samples: 1214912 | elapsed time per iteration (ms): 19989.1 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11360/ 159576 | consumed samples: 1219072 | elapsed time per iteration (ms): 19868.7 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11370/ 159576 | consumed samples: 1223232 | elapsed time per iteration (ms): 19987.6 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11380/ 159576 | consumed samples: 1227392 | elapsed time per iteration (ms): 19947.5 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11390/ 159576 | consumed samples: 1231664 | elapsed time per iteration (ms): 20206.1 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11400/ 159576 | consumed samples: 1235984 | elapsed time per iteration (ms): 20686.4 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11410/ 159576 | consumed samples: 1240304 | elapsed time per iteration (ms): 20763.5 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11420/ 159576 | consumed samples: 1244624 | elapsed time per iteration (ms): 20718.0 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11430/ 159576 | consumed samples: 1248944 | elapsed time per iteration (ms): 20629.3 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11440/ 159576 | consumed samples: 1253264 | elapsed time per iteration (ms): 20735.7 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11450/ 159576 | consumed samples: 1257584 | elapsed time per iteration (ms): 20551.6 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11460/ 159576 | consumed samples: 1261904 | elapsed time per iteration (ms): 20425.6 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11470/ 159576 | consumed samples: 1266224 | elapsed time per iteration (ms): 20522.3 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11480/ 159576 | consumed samples: 1270544 | elapsed time per iteration (ms): 20523.5 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11490/ 159576 | consumed samples: 1274864 | elapsed time per iteration (ms): 20644.7 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11500/ 159576 | consumed samples: 1279312 | elapsed time per iteration (ms): 21082.2 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11510/ 159576 | consumed samples: 1283792 | elapsed time per iteration (ms): 21312.4 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11520/ 159576 | consumed samples: 1288272 | elapsed time per iteration (ms): 21403.7 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11530/ 159576 | consumed samples: 1292752 | elapsed time per iteration (ms): 21133.4 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11540/ 159576 | consumed samples: 1297232 | elapsed time per iteration (ms): 21166.4 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11550/ 159576 | consumed samples: 1301712 | elapsed time per iteration (ms): 21259.6 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 12:27:56] PULSE: tr8-104B is running for 8:33:40 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 11560/ 159576 | consumed samples: 1306192 | elapsed time per iteration (ms): 21050.1 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11570/ 159576 | consumed samples: 1310672 | elapsed time per iteration (ms): 21058.2 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11580/ 159576 | consumed samples: 1315152 | elapsed time per iteration (ms): 21057.7 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11590/ 159576 | consumed samples: 1319632 | elapsed time per iteration (ms): 21281.4 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11600/ 159576 | consumed samples: 1324144 | elapsed time per iteration (ms): 21318.5 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11610/ 159576 | consumed samples: 1328784 | elapsed time per iteration (ms): 21769.2 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11620/ 159576 | consumed samples: 1333424 | elapsed time per iteration (ms): 21656.2 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11630/ 159576 | consumed samples: 1338064 | elapsed time per iteration (ms): 21947.9 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11640/ 159576 | consumed samples: 1342704 | elapsed time per iteration (ms): 21602.8 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11650/ 159576 | consumed samples: 1347344 | elapsed time per iteration (ms): 21770.3 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11660/ 159576 | consumed samples: 1351984 | elapsed time per iteration (ms): 21697.2 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11670/ 159576 | consumed samples: 1356624 | elapsed time per iteration (ms): 22004.7 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11680/ 159576 | consumed samples: 1361264 | elapsed time per iteration (ms): 21654.6 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11690/ 159576 | consumed samples: 1365904 | elapsed time per iteration (ms): 21840.4 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11700/ 159576 | consumed samples: 1370560 | elapsed time per iteration (ms): 21982.9 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11710/ 159576 | consumed samples: 1375360 | elapsed time per iteration (ms): 22227.6 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11720/ 159576 | consumed samples: 1380160 | elapsed time per iteration (ms): 22533.1 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 13:27:56] PULSE: tr8-104B is running for 9:33:40 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 11730/ 159576 | consumed samples: 1384960 | elapsed time per iteration (ms): 22192.1 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11740/ 159576 | consumed samples: 1389760 | elapsed time per iteration (ms): 22268.7 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11750/ 159576 | consumed samples: 1394560 | elapsed time per iteration (ms): 22268.4 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11760/ 159576 | consumed samples: 1399360 | elapsed time per iteration (ms): 22141.9 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11770/ 159576 | consumed samples: 1404160 | elapsed time per iteration (ms): 21979.0 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11780/ 159576 | consumed samples: 1408960 | elapsed time per iteration (ms): 22172.2 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11790/ 159576 | consumed samples: 1413760 | elapsed time per iteration (ms): 22335.9 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11800/ 159576 | consumed samples: 1418592 | elapsed time per iteration (ms): 22588.3 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11810/ 159576 | consumed samples: 1423552 | elapsed time per iteration (ms): 22823.4 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11820/ 159576 | consumed samples: 1428512 | elapsed time per iteration (ms): 22959.2 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11830/ 159576 | consumed samples: 1433472 | elapsed time per iteration (ms): 23080.3 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11840/ 159576 | consumed samples: 1438432 | elapsed time per iteration (ms): 23034.0 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11850/ 159576 | consumed samples: 1443392 | elapsed time per iteration (ms): 23099.6 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11860/ 159576 | consumed samples: 1448352 | elapsed time per iteration (ms): 23031.2 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11870/ 159576 | consumed samples: 1453312 | elapsed time per iteration (ms): 22866.8 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11880/ 159576 | consumed samples: 1458272 | elapsed time per iteration (ms): 23007.5 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 14:27:59] PULSE: tr8-104B is running for 10:33:43 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 11890/ 159576 | consumed samples: 1463232 | elapsed time per iteration (ms): 23034.3 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11900/ 159576 | consumed samples: 1468304 | elapsed time per iteration (ms): 23486.5 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11910/ 159576 | consumed samples: 1473424 | elapsed time per iteration (ms): 23540.7 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11920/ 159576 | consumed samples: 1478544 | elapsed time per iteration (ms): 23676.0 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11930/ 159576 | consumed samples: 1483664 | elapsed time per iteration (ms): 23529.7 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11940/ 159576 | consumed samples: 1488784 | elapsed time per iteration (ms): 23604.1 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11950/ 159576 | consumed samples: 1493904 | elapsed time per iteration (ms): 23627.0 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11960/ 159576 | consumed samples: 1499024 | elapsed time per iteration (ms): 23559.5 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11970/ 159576 | consumed samples: 1504144 | elapsed time per iteration (ms): 23611.0 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11980/ 159576 | consumed samples: 1509264 | elapsed time per iteration (ms): 23634.8 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11990/ 159576 | consumed samples: 1514464 | elapsed time per iteration (ms): 23596.0 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 15:14:45,510] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=3052, lr=[5.999919375575235e-05, 5.999919375575235e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 12000 loss: nan iter time (s): 0.012 samples/sec: 43274.454 iteration 12000/ 159576 | consumed samples: 1519744 | elapsed time per iteration (ms): 24091.4 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 12000 | lm loss value: 7.282808E+00 | lm loss PPL: 1.455068E+03 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 12000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-27 15:15:22,225] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step12000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 12000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 32585.61 iteration 12010/ 159576 | consumed samples: 1525024 | elapsed time per iteration (ms): 30246.8 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12020/ 159576 | consumed samples: 1530304 | elapsed time per iteration (ms): 24139.3 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12030/ 159576 | consumed samples: 1535584 | elapsed time per iteration (ms): 24280.0 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 15:28:02] PULSE: tr8-104B is running for 11:33:46 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 12040/ 159576 | consumed samples: 1540864 | elapsed time per iteration (ms): 23963.9 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12050/ 159576 | consumed samples: 1546144 | elapsed time per iteration (ms): 24135.8 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12060/ 159576 | consumed samples: 1551424 | elapsed time per iteration (ms): 24044.3 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12070/ 159576 | consumed samples: 1556704 | elapsed time per iteration (ms): 24087.4 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12080/ 159576 | consumed samples: 1562064 | elapsed time per iteration (ms): 24400.0 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12090/ 159576 | consumed samples: 1567504 | elapsed time per iteration (ms): 24552.7 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12100/ 159576 | consumed samples: 1572944 | elapsed time per iteration (ms): 24886.7 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12110/ 159576 | consumed samples: 1578384 | elapsed time per iteration (ms): 24781.4 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12120/ 159576 | consumed samples: 1583824 | elapsed time per iteration (ms): 24493.1 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12130/ 159576 | consumed samples: 1589264 | elapsed time per iteration (ms): 24851.3 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12140/ 159576 | consumed samples: 1594704 | elapsed time per iteration (ms): 24746.4 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12150/ 159576 | consumed samples: 1600144 | elapsed time per iteration (ms): 24578.3 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12160/ 159576 | consumed samples: 1605584 | elapsed time per iteration (ms): 24469.2 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12170/ 159576 | consumed samples: 1611152 | elapsed time per iteration (ms): 24994.1 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 16:28:40] PULSE: tr8-104B is running for 12:34:24 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 12180/ 159576 | consumed samples: 1616752 | elapsed time per iteration (ms): 25275.1 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12190/ 159576 | consumed samples: 1622352 | elapsed time per iteration (ms): 25176.8 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12200/ 159576 | consumed samples: 1627952 | elapsed time per iteration (ms): 25167.8 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12210/ 159576 | consumed samples: 1633552 | elapsed time per iteration (ms): 25057.7 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12220/ 159576 | consumed samples: 1639152 | elapsed time per iteration (ms): 25147.4 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12230/ 159576 | consumed samples: 1644752 | elapsed time per iteration (ms): 25198.7 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12240/ 159576 | consumed samples: 1650352 | elapsed time per iteration (ms): 24894.2 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12250/ 159576 | consumed samples: 1656016 | elapsed time per iteration (ms): 25306.4 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12260/ 159576 | consumed samples: 1661776 | elapsed time per iteration (ms): 25946.7 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12270/ 159576 | consumed samples: 1667536 | elapsed time per iteration (ms): 25714.3 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12280/ 159576 | consumed samples: 1673296 | elapsed time per iteration (ms): 25863.6 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12290/ 159576 | consumed samples: 1679056 | elapsed time per iteration (ms): 26038.1 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12300/ 159576 | consumed samples: 1684816 | elapsed time per iteration (ms): 25611.4 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12310/ 159576 | consumed samples: 1690576 | elapsed time per iteration (ms): 25819.3 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 17:28:18] PULSE: tr8-104B is running for 13:34:02 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 12320/ 159576 | consumed samples: 1696336 | elapsed time per iteration (ms): 25983.5 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12330/ 159576 | consumed samples: 1702128 | elapsed time per iteration (ms): 25674.0 | learning rate: 6.000E-05 | global batch size: 592 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12340/ 159576 | consumed samples: 1708048 | elapsed time per iteration (ms): 26437.1 | learning rate: 6.000E-05 | global batch size: 592 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) Killing subprocess 76100 Killing subprocess 76101 Killing subprocess 76102 Killing subprocess 76103 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/tr1-13B/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '8', '--num-layers', '32', '--hidden-size', '16384', '--ffn-hidden-size', '20480', '--num-attention-heads', '32', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--rampup-batch-size', '16', '16', '6_000_000', '--global-batch-size', '2048', '--train-samples', '300_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--seed', '42', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.999', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-decay-style', 'cosine', '--lr-decay-samples', '126_953_125', '--lr-warmup-samples', '216_320', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '10', '--save-interval', '1500', '--eval-interval', '1000', '--eval-iters', '5', '--codecarbon-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1188168.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' died with . srun: error: r6i5n7: task 0: Exited with exit code 1 srun: Terminating job step 1188168.0 Killing subprocess 59848 Killing subprocess 59849 Killing subprocess 59850 Killing subprocess 69437 Killing subprocess 59851 Killing subprocess 3750 Killing subprocess 69438 Killing subprocess 23911 Killing subprocess 36274 Killing subprocess 12887 Killing subprocess 64701 Killing subprocess 46448 Killing subprocess 37626 Killing subprocess 69439 Killing subprocess 12566 Killing subprocess 45975 Killing subprocess 59577 Killing subprocess 3751 Killing subprocess 69440 Killing subprocess 20638 Killing subprocess 12618 Killing subprocess 63737 Killing subprocess 12888 Killing subprocess 24910 Killing subprocess 77610 Killing subprocess 3752 Killing subprocess 65070 Killing subprocess 64702 Killing subprocess 46449 Killing subprocess 3710 Killing subprocess 36275 Killing subprocess 59578 Killing subprocess 64317 Killing subprocess 37627 Killing subprocess 23912 Killing subprocess 54693 Killing subprocess 76941 Killing subprocess 20639 Killing subprocess 74689 Killing subprocess 65692 Killing subprocess 12619 Killing subprocess 12567 Killing subprocess 63738 Killing subprocess 19395 Killing subprocess 44152 Killing subprocess 35247 Killing subprocess 14362 Killing subprocess 77611 Killing subprocess 59276 Killing subprocess 59579 Main process received SIGTERM, exiting Killing subprocess 37628 Killing subprocess 3753 Killing subprocess 65071 Main process received SIGTERM, exiting Killing subprocess 23913 Killing subprocess 54694 Killing subprocess 64703 Killing subprocess 12568 Killing subprocess 63739 Killing subprocess 46450 Killing subprocess 45976 Killing subprocess 3711 Killing subprocess 38195 Killing subprocess 36276 Killing subprocess 12889 Killing subprocess 24911 Killing subprocess 10979 Killing subprocess 77612 Killing subprocess 59580 Killing subprocess 18302 Killing subprocess 63373 Killing subprocess 64318 Killing subprocess 37630 Killing subprocess 65072 Killing subprocess 52483 Killing subprocess 23914 Killing subprocess 54695 Killing subprocess 68328 Killing subprocess 76942 Killing subprocess 20640 Killing subprocess 74690 Killing subprocess 65693 Killing subprocess 64705 Killing subprocess 12620 Killing subprocess 12569 Killing subprocess 63740 Killing subprocess 46451 Killing subprocess 45977 Killing subprocess 55848 Killing subprocess 3712 Killing subprocess 19396 Killing subprocess 44153 Killing subprocess 35248 Killing subprocess 47024 Killing subprocess 33695 Killing subprocess 36277 Killing subprocess 12891 Killing subprocess 63460 Killing subprocess 14363 Killing subprocess 57783 Killing subprocess 24912 Killing subprocess 10980 Killing subprocess 77613 Killing subprocess 59277 Killing subprocess 69993 Killing subprocess 53038 Killing subprocess 18303 Killing subprocess 63374 Killing subprocess 64319 Killing subprocess 8034 Killing subprocess 62238 Main process received SIGTERM, exiting Killing subprocess 53475 Killing subprocess 65073 Killing subprocess 52484 Killing subprocess 54696 Killing subprocess 68329 Killing subprocess 76943 Killing subprocess 20641 Killing subprocess 74691 Killing subprocess 65694 Killing subprocess 43049 Killing subprocess 12621 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 45978 Killing subprocess 55849 Killing subprocess 3713 Killing subprocess 39768 Killing subprocess 19397 Killing subprocess 44154 Killing subprocess 35249 Killing subprocess 47025 Killing subprocess 71483 Killing subprocess 33696 Killing subprocess 38196 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 63461 Killing subprocess 14364 Killing subprocess 57784 Killing subprocess 24913 Main process received SIGTERM, exiting Killing subprocess 59278 Killing subprocess 70408 Killing subprocess 69994 Killing subprocess 2853 Killing subprocess 53039 Killing subprocess 18304 Killing subprocess 52628 Killing subprocess 63375 Killing subprocess 64320 Killing subprocess 77051 Killing subprocess 41073 Killing subprocess 8035 Killing subprocess 3968 Killing subprocess 23148 Killing subprocess 67068 Main process received SIGTERM, exiting Killing subprocess 81189 Killing subprocess 62239 Killing subprocess 53476 Killing subprocess 69086 Killing subprocess 52485 Main process received SIGTERM, exiting Killing subprocess 62883 Killing subprocess 65551 Killing subprocess 68330 Killing subprocess 76945 Main process received SIGTERM, exiting Killing subprocess 75336 Killing subprocess 15286 Killing subprocess 74692 Killing subprocess 65695 Killing subprocess 43050 Main process received SIGTERM, exiting Killing subprocess 66988 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 55850 Killing subprocess 42101 Main process received SIGTERM, exiting Killing subprocess 8608 Killing subprocess 39769 Killing subprocess 19398 Killing subprocess 44155 Killing subprocess 15244 Killing subprocess 50869 Killing subprocess 35250 Killing subprocess 47026 Killing subprocess 71484 Killing subprocess 35789 Killing subprocess 56590 Killing subprocess 33697 Killing subprocess 38197 Killing subprocess 21496 Killing subprocess 63462 Killing subprocess 81499 Killing subprocess 14365 Killing subprocess 57785 Main process received SIGTERM, exiting Killing subprocess 10981 Killing subprocess 59279 Killing subprocess 37333 Main process received SIGTERM, exiting Killing subprocess 48823 Killing subprocess 70409 Killing subprocess 69995 Killing subprocess 2854 Killing subprocess 53040 Killing subprocess 18305 Killing subprocess 52629 Killing subprocess 63376 Main process received SIGTERM, exiting Killing subprocess 77052 Killing subprocess 41074 Killing subprocess 8036 Killing subprocess 39465 Killing subprocess 39466 Killing subprocess 39467 Killing subprocess 79012 Killing subprocess 3969 Killing subprocess 23149 Killing subprocess 67069 Killing subprocess 81190 Killing subprocess 56744 Killing subprocess 66319 Killing subprocess 62240 Killing subprocess 53477 Killing subprocess 25176 Killing subprocess 69087 Main process received SIGTERM, exiting Killing subprocess 52486 Killing subprocess 23707 Killing subprocess 62884 Main process received SIGTERM, exiting Killing subprocess 65552 Killing subprocess 68331 Killing subprocess 10802 Main process received SIGTERM, exiting Killing subprocess 37596 Killing subprocess 75337 Killing subprocess 15287 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 43051 Killing subprocess 12337 Killing subprocess 66989 Killing subprocess 50840 Killing subprocess 55851 Killing subprocess 42102 Killing subprocess 77529 Killing subprocess 13528 Killing subprocess 8609 Killing subprocess 14216 Killing subprocess 39770 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 15245 Killing subprocess 50870 Main process received SIGTERM, exiting Killing subprocess 47027 Killing subprocess 79944 Killing subprocess 71485 Killing subprocess 9027 Killing subprocess 35790 Killing subprocess 56591 Killing subprocess 33699 Killing subprocess 38198 Killing subprocess 37572 Killing subprocess 21497 Killing subprocess 63463 Killing subprocess 81500 Main process received SIGTERM, exiting Killing subprocess 57787 Killing subprocess 41379 Killing subprocess 10982 Main process received SIGTERM, exiting Killing subprocess 37334 Killing subprocess 48824 Killing subprocess 38560 Killing subprocess 41538 Killing subprocess 70410 Killing subprocess 69997 Killing subprocess 55623 Killing subprocess 2855 Killing subprocess 53042 Main process received SIGTERM, exiting Killing subprocess 52630 Main process received SIGTERM, exiting Killing subprocess 77053 Killing subprocess 41075 Killing subprocess 76949 Killing subprocess 8037 Killing subprocess 39468 Main process received SIGTERM, exiting Killing subprocess 79013 Killing subprocess 3970 Killing subprocess 23150 Killing subprocess 67070 Killing subprocess 2742 Killing subprocess 81191 Killing subprocess 47225 Killing subprocess 56745 Killing subprocess 66320 Killing subprocess 62241 Killing subprocess 54272 Killing subprocess 53478 Killing subprocess 25177 Killing subprocess 69088 Main process received SIGTERM, exiting Killing subprocess 23708 Killing subprocess 62885 Killing subprocess 79197 Killing subprocess 65553 Main process received SIGTERM, exiting Killing subprocess 10803 Killing subprocess 37597 Killing subprocess 75338 Killing subprocess 15288 Killing subprocess 43052 Killing subprocess 12338 Killing subprocess 14353 Killing subprocess 66990 Killing subprocess 50841 Killing subprocess 75513 Main process received SIGTERM, exiting Killing subprocess 42103 Killing subprocess 77530 Killing subprocess 13529 Killing subprocess 8610 Killing subprocess 14217 Killing subprocess 39772 Killing subprocess 15246 Killing subprocess 50871 Killing subprocess 52998 Killing subprocess 75590 Main process received SIGTERM, exiting Killing subprocess 79945 Killing subprocess 71487 Killing subprocess 9028 Killing subprocess 35791 Killing subprocess 56592 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 37573 Killing subprocess 21498 Main process received SIGTERM, exiting Killing subprocess 81501 Main process received SIGTERM, exiting Killing subprocess 41380 Main process received SIGTERM, exiting Killing subprocess 37335 Killing subprocess 48825 Killing subprocess 38561 Killing subprocess 41539 Killing subprocess 70411 Main process received SIGTERM, exiting Killing subprocess 55624 Killing subprocess 69208 Killing subprocess 2856 Main process received SIGTERM, exiting Killing subprocess 52631 Killing subprocess 35916 Killing subprocess 4836 Killing subprocess 77055 Killing subprocess 41076 Killing subprocess 76950 Main process received SIGTERM, exiting Killing subprocess 47505 Killing subprocess 79014 Killing subprocess 3971 Killing subprocess 23151 Killing subprocess 67071 Killing subprocess 34883 Killing subprocess 2743 Killing subprocess 81192 Killing subprocess 47226 Killing subprocess 56746 Killing subprocess 17937 Killing subprocess 66321 Main process received SIGTERM, exiting Killing subprocess 54273 Main process received SIGTERM, exiting Killing subprocess 25178 Killing subprocess 69089 Killing subprocess 23709 Killing subprocess 62886 Killing subprocess 79198 Killing subprocess 65554 Killing subprocess 67154 Killing subprocess 10804 Killing subprocess 37598 Killing subprocess 75339 Killing subprocess 15289 Main process received SIGTERM, exiting Killing subprocess 12339 Killing subprocess 14354 Killing subprocess 66992 Killing subprocess 50842 Killing subprocess 39827 Killing subprocess 75514 Killing subprocess 42105 Killing subprocess 77531 Killing subprocess 53851 Killing subprocess 13530 Killing subprocess 8611 Killing subprocess 14218 Main process received SIGTERM, exiting Killing subprocess 15247 Killing subprocess 50872 Killing subprocess 52999 Killing subprocess 75591 Killing subprocess 44143 Killing subprocess 79946 Main process received SIGTERM, exiting Killing subprocess 9029 Killing subprocess 35792 Killing subprocess 56593 Killing subprocess 37574 Killing subprocess 57528 Killing subprocess 21499 Killing subprocess 81502 Killing subprocess 41381 Killing subprocess 37336 Killing subprocess 48826 Killing subprocess 16969 Killing subprocess 38562 Killing subprocess 41540 Main process received SIGTERM, exiting Killing subprocess 55625 Killing subprocess 69209 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 35917 Killing subprocess 4837 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 76951 Killing subprocess 47506 Killing subprocess 79015 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 34884 Killing subprocess 2744 Main process received SIGTERM, exiting Killing subprocess 47227 Killing subprocess 56747 Killing subprocess 17938 Killing subprocess 66322 Killing subprocess 45571 Killing subprocess 54274 Killing subprocess 25179 Main process received SIGTERM, exiting Killing subprocess 23711 Main process received SIGTERM, exiting Killing subprocess 79199 Main process received SIGTERM, exiting Killing subprocess 67155 Killing subprocess 10805 Killing subprocess 37599 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 12340 Killing subprocess 14355 Main process received SIGTERM, exiting Killing subprocess 50844 Killing subprocess 39828 Killing subprocess 75515 Main process received SIGTERM, exiting Killing subprocess 7953 Killing subprocess 77532 Killing subprocess 53852 Killing subprocess 13531 Main process received SIGTERM, exiting Killing subprocess 14219 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 53000 Killing subprocess 75592 Killing subprocess 44144 Killing subprocess 79947 Killing subprocess 9030 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 37575 Killing subprocess 57529 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 41383 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 16970 Killing subprocess 38563 Killing subprocess 41541 Killing subprocess 55626 Killing subprocess 69210 Killing subprocess 35918 Killing subprocess 4838 Killing subprocess 76953 Killing subprocess 47507 Main process received SIGTERM, exiting Killing subprocess 34885 Killing subprocess 2745 Killing subprocess 47228 Main process received SIGTERM, exiting Killing subprocess 17939 Main process received SIGTERM, exiting Killing subprocess 45572 Killing subprocess 54275 Main process received SIGTERM, exiting Killing subprocess 34811 Main process received SIGTERM, exiting Killing subprocess 79200 Killing subprocess 67156 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 14357 Main process received SIGTERM, exiting Killing subprocess 39829 Killing subprocess 75516 Killing subprocess 7954 Main process received SIGTERM, exiting Killing subprocess 53853 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 53002 Killing subprocess 75593 Killing subprocess 44145 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 57530 Main process received SIGTERM, exiting Killing subprocess 16971 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 69211 Killing subprocess 35919 Killing subprocess 4839 Main process received SIGTERM, exiting Killing subprocess 47509 Killing subprocess 34886 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 17940 Killing subprocess 45573 Main process received SIGTERM, exiting Killing subprocess 34812 Main process received SIGTERM, exiting Killing subprocess 67157 Main process received SIGTERM, exiting Killing subprocess 39830 Main process received SIGTERM, exiting Killing subprocess 7955 Killing subprocess 53854 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 44147 Killing subprocess 57531 Killing subprocess 16972 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 45575 Killing subprocess 34813 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 34814 Main process received SIGTERM, exiting Killing subprocess 7956 Main process received SIGTERM, exiting Killing subprocess 42690 Killing subprocess 42691 Killing subprocess 42692 Killing subprocess 42693 Main process received SIGTERM, exiting Killing subprocess 7083 Killing subprocess 7084 Killing subprocess 7085 Killing subprocess 22811 Killing subprocess 7086 Killing subprocess 22812 Main process received SIGTERM, exiting Killing subprocess 22813 Killing subprocess 22814 Main process received SIGTERM, exiting Killing subprocess 13431 Killing subprocess 13432 Killing subprocess 13433 Killing subprocess 13434 Main process received SIGTERM, exiting Killing subprocess 72295 Killing subprocess 72296 Killing subprocess 72297 Killing subprocess 15401 Killing subprocess 72298 Killing subprocess 15402 Killing subprocess 15403 Killing subprocess 15405 Killing subprocess 52149 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 52150 Killing subprocess 52151 Killing subprocess 52152 Main process received SIGTERM, exiting Killing subprocess 38674 Killing subprocess 38675 Killing subprocess 33953 Killing subprocess 33954 Killing subprocess 38676 Killing subprocess 38677 Killing subprocess 33955 Main process received SIGTERM, exiting Killing subprocess 33957 Main process received SIGTERM, exiting Killing subprocess 65236 Killing subprocess 65237 Killing subprocess 65238 Killing subprocess 65239 Main process received SIGTERM, exiting srun: error: r8i1n2: task 43: Exited with exit code 1 srun: error: r9i5n7: task 109: Exited with exit code 1 srun: error: r9i6n0: task 111: Exited with exit code 1 srun: error: r9i0n1: task 65: Exited with exit code 1 srun: error: r9i0n3: task 67: Exited with exit code 1 srun: error: r8i0n2: task 34: Exited with exit code 1 srun: error: r7i6n2: task 20: Exited with exit code 1 srun: error: r9i2n8: task 87: Exited with exit code 1 srun: error: r8i0n8: task 40: Exited with exit code 1 srun: error: r9i3n1: task 89: Exited with exit code 1 srun: error: r9i4n1: task 95: Exited with exit code 1 srun: error: r9i3n0: task 88: Exited with exit code 1 srun: error: r6i6n0: task 2: Exited with exit code 1 srun: error: r8i3n2: task 49: Exited with exit code 1 srun: error: r8i0n7: task 39: Exited with exit code 1 srun: error: r9i6n7: task 118: Exited with exit code 1 srun: error: r8i7n6: task 61: Exited with exit code 1 srun: error: r8i7n4: task 59: Exited with exit code 1 srun: error: r9i5n4: task 106: Exited with exit code 1 srun: error: r8i0n6: task 38: Exited with exit code 1 srun: error: r9i5n8: task 110: Exited with exit code 1 srun: error: r9i0n4: task 68: Exited with exit code 1 srun: error: r9i4n3: task 97: Exited with exit code 1 srun: error: r8i1n0: task 41: Exited with exit code 1 srun: error: r7i7n7: task 30: Exited with exit code 1 srun: error: r9i2n3: task 82: Exited with exit code 1 srun: error: r9i6n8: task 119: Exited with exit code 1 srun: error: r8i7n7: task 62: Exited with exit code 1 srun: error: r9i5n5: task 107: Exited with exit code 1 srun: error: r9i2n6: task 85: Exited with exit code 1 srun: error: r7i6n4: task 22: Exited with exit code 1 srun: error: r9i1n2: task 74: Exited with exit code 1 srun: error: r9i0n0: task 64: Exited with exit code 1 srun: error: r9i0n5: task 69: Exited with exit code 1 srun: error: r8i2n8: task 46: Exited with exit code 1 srun: error: r9i4n2: task 96: Exited with exit code 1 srun: error: r7i3n2: task 17: Exited with exit code 1 srun: error: r9i3n7: task 92: Exited with exit code 1 srun: error: r9i0n2: task 66: Exited with exit code 1 srun: error: r9i1n3: task 75: Exited with exit code 1 srun: error: r8i1n4: task 45: Exited with exit code 1 srun: error: r8i7n5: task 60: Exited with exit code 1 srun: error: r9i2n5: task 84: Exited with exit code 1 srun: error: r7i7n8: task 31: Exited with exit code 1 srun: error: r8i0n5: task 37: Exited with exit code 1 srun: error: r8i7n3: task 58: Exited with exit code 1 srun: error: r7i6n3: task 21: Exited with exit code 1 srun: error: r9i1n1: task 73: Exited with exit code 1 srun: error: r9i3n8: task 93: Exited with exit code 1 srun: error: r8i7n8: task 63: Exited with exit code 1 srun: error: r8i3n0: task 47: Exited with exit code 1 srun: error: r8i0n3: task 35: Exited with exit code 1 srun: error: r9i4n0: task 94: Exited with exit code 1 srun: error: r9i5n3: task 105: Exited with exit code 1 srun: error: r8i1n3: task 44: Exited with exit code 1 srun: error: r8i6n6: task 57: Exited with exit code 1 srun: error: r8i0n0: task 32: Exited with exit code 1 srun: error: r9i5n6: task 108: Exited with exit code 1 srun: error: r9i2n4: task 83: Exited with exit code 1 srun: error: r8i3n1: task 48: Exited with exit code 1 srun: error: r7i2n5: task 15: Exited with exit code 1 srun: error: r9i1n0: task 72: Exited with exit code 1 srun: error: r7i5n7: task 18: Exited with exit code 1 srun: error: r6i5n8: task 1: Exited with exit code 1 srun: error: r8i3n8: task 51: Exited with exit code 1 srun: error: r8i0n4: task 36: Exited with exit code 1 srun: error: r8i0n1: task 33: Exited with exit code 1 srun: error: r7i7n2: task 26: Exited with exit code 1 srun: error: r8i3n3: task 50: Exited with exit code 1 srun: error: r7i7n6: task 29: Exited with exit code 1 srun: error: r7i6n1: task 19: Exited with exit code 1 srun: error: r7i6n8: task 23: Exited with exit code 1 srun: error: r9i2n0: task 81: Exited with exit code 1 srun: error: r9i4n6: task 100: Exited with exit code 1 srun: error: r8i6n2: task 54: Exited with exit code 1 srun: error: r9i3n2: task 90: Exited with exit code 1 srun: error: r8i6n3: task 55: Exited with exit code 1 srun: error: r7i7n0: task 24: Exited with exit code 1 srun: error: r8i4n0: task 52: Exited with exit code 1 srun: error: r9i1n8: task 80: Exited with exit code 1 srun: error: r8i4n1: task 53: Exited with exit code 1 srun: error: r8i1n1: task 42: Exited with exit code 1 srun: error: r9i5n2: task 104: Exited with exit code 1 srun: error: r9i0n8: task 71: Exited with exit code 1 srun: error: r9i5n1: task 103: Exited with exit code 1 srun: error: r7i7n1: task 25: Exited with exit code 1 srun: error: r9i4n4: task 98: Exited with exit code 1 srun: error: r7i7n4: task 28: Exited with exit code 1 srun: error: r9i0n6: task 70: Exited with exit code 1 srun: error: r9i1n7: task 79: Exited with exit code 1 srun: error: r9i2n7: task 86: Exited with exit code 1 srun: error: r9i1n6: task 78: Exited with exit code 1 srun: error: r9i5n0: task 102: Exited with exit code 1 srun: error: r9i3n6: task 91: Exited with exit code 1 srun: error: r9i1n5: task 77: Exited with exit code 1 srun: error: r7i2n8: task 16: Exited with exit code 1 srun: error: r9i4n8: task 101: Exited with exit code 1 srun: error: r9i4n5: task 99: Exited with exit code 1 srun: error: r7i2n1: task 14: Exited with exit code 1 srun: error: r7i0n0: task 5: Exited with exit code 1 srun: error: r9i6n6: task 117: Exited with exit code 1 srun: error: r9i7n6: task 125: Exited with exit code 1 srun: error: r9i7n4: task 123: Exited with exit code 1 srun: error: r6i7n8: task 4: Exited with exit code 1 srun: error: r9i6n2: task 113: Exited with exit code 1 srun: error: r9i6n3: task 114: Exited with exit code 1 srun: error: r6i7n7: task 3: Exited with exit code 1 srun: error: r7i0n5: task 10: Exited with exit code 1 srun: error: r9i7n5: task 124: Exited with exit code 1 srun: error: r7i1n8: task 12: Exited with exit code 1 srun: error: r9i7n7: task 126: Exited with exit code 1 srun: error: r7i0n2: task 7: Exited with exit code 1 srun: error: r7i0n3: task 8: Exited with exit code 1 srun: error: r9i7n8: task 127: Exited with exit code 1 srun: error: r7i2n0: task 13: Exited with exit code 1 srun: error: r7i1n7: task 11: Exited with exit code 1 srun: error: r9i6n1: task 112: Exited with exit code 1 srun: error: r7i0n4: task 9: Exited with exit code 1 srun: error: r9i1n4: task 76: Exited with exit code 1 srun: error: r9i7n2: task 121: Exited with exit code 1 srun: error: r9i6n4: task 115: Exited with exit code 1 srun: error: r7i0n1: task 6: Exited with exit code 1 srun: error: r9i7n3: task 122: Exited with exit code 1 srun: error: r8i6n5: task 56: Exited with exit code 1 srun: error: r9i6n5: task 116: Exited with exit code 1 srun: error: r7i7n3: task 27: Exited with exit code 1 srun: error: r9i7n1: task 120: Exited with exit code 1 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Killing subprocess 32020 Killing subprocess 32021 Killing subprocess 32022 Killing subprocess 32023 Main process received SIGTERM, exiting Killing subprocess 2391 Killing subprocess 2392 Killing subprocess 2393 Killing subprocess 2395 Main process received SIGTERM, exiting Killing subprocess 8155 Killing subprocess 8156 Killing subprocess 8157 Killing subprocess 8158 Main process received SIGTERM, exiting Killing subprocess 1105 Killing subprocess 1106 Killing subprocess 1107 Killing subprocess 1108 Main process received SIGTERM, exiting Killing subprocess 61308 Killing subprocess 70292 Killing subprocess 42836 Killing subprocess 70293 Killing subprocess 70294 Killing subprocess 30001 Killing subprocess 61309 Killing subprocess 61310 Killing subprocess 61312 Killing subprocess 42837 Killing subprocess 57225 Killing subprocess 70296 Killing subprocess 30002 Killing subprocess 30003 Killing subprocess 13020 Main process received SIGTERM, exiting Killing subprocess 40485 Killing subprocess 72254 Killing subprocess 42838 Killing subprocess 42840 Main process received SIGTERM, exiting Killing subprocess 57226 Killing subprocess 57227 Main process received SIGTERM, exiting Killing subprocess 76054 Killing subprocess 30004 Main process received SIGTERM, exiting Killing subprocess 13021 Killing subprocess 40486 Killing subprocess 14664 Killing subprocess 72255 Killing subprocess 57228 Main process received SIGTERM, exiting Killing subprocess 16769 Killing subprocess 76055 Killing subprocess 76056 Killing subprocess 13022 Killing subprocess 13023 Main process received SIGTERM, exiting Killing subprocess 40487 Killing subprocess 40488 Main process received SIGTERM, exiting Killing subprocess 14665 Killing subprocess 14666 Killing subprocess 14668 Killing subprocess 72256 Killing subprocess 72258 Main process received SIGTERM, exiting Killing subprocess 16770 Killing subprocess 16771 Killing subprocess 76057 Main process received SIGTERM, exiting Killing subprocess 60803 Main process received SIGTERM, exiting Killing subprocess 66379 Killing subprocess 16772 Main process received SIGTERM, exiting Killing subprocess 60804 Killing subprocess 66380 Killing subprocess 66381 Killing subprocess 13204 Killing subprocess 60805 Killing subprocess 60806 Main process received SIGTERM, exiting Killing subprocess 66382 Main process received SIGTERM, exiting Killing subprocess 13205 Killing subprocess 13206 Killing subprocess 13207 Killing subprocess 33516 Killing subprocess 33006 Main process received SIGTERM, exiting Killing subprocess 33517 Killing subprocess 33518 Killing subprocess 33520 Killing subprocess 33007 Killing subprocess 33008 Killing subprocess 33009 Killing subprocess 72301 Killing subprocess 16814 Main process received SIGTERM, exiting Killing subprocess 59087 Killing subprocess 74735 Killing subprocess 13261 Main process received SIGTERM, exiting Killing subprocess 55620 Killing subprocess 72302 Killing subprocess 16815 Killing subprocess 59088 Killing subprocess 74736 Killing subprocess 74737 Killing subprocess 74738 Main process received SIGTERM, exiting Killing subprocess 13262 Killing subprocess 55621 Killing subprocess 55622 Killing subprocess 72303 Killing subprocess 72304 Main process received SIGTERM, exiting slurmstepd: error: *** STEP 1271130.0 ON r7i6n1 CANCELLED AT 2021-09-27T17:43:09 *** Killing subprocess 5069 Killing subprocess 16816 Killing subprocess 16817 Main process received SIGTERM, exiting Killing subprocess 59089 Killing subprocess 59090 Main process received SIGTERM, exiting Killing subprocess 36826 Killing subprocess 13263 Killing subprocess 13264 Main process received SIGTERM, exiting Killing subprocess 55623 Main process received SIGTERM, exiting Killing subprocess 72745 Killing subprocess 5070 Killing subprocess 5071 Killing subprocess 36827 Killing subprocess 36828 Killing subprocess 22929 Killing subprocess 5072 Main process received SIGTERM, exiting Killing subprocess 23020 Killing subprocess 39440 Killing subprocess 36829 Main process received SIGTERM, exiting Killing subprocess 72746 Killing subprocess 23021 Killing subprocess 39441 Killing subprocess 22930 Killing subprocess 22931 Killing subprocess 60544 Killing subprocess 72747 Killing subprocess 72748 Main process received SIGTERM, exiting Killing subprocess 23022 Killing subprocess 23023 Main process received SIGTERM, exiting Killing subprocess 39442 Killing subprocess 39443 Main process received SIGTERM, exiting Killing subprocess 4007 Killing subprocess 22932 Main process received SIGTERM, exiting Killing subprocess 60545 Killing subprocess 38454 Killing subprocess 31565 Killing subprocess 62249 Killing subprocess 4008 Killing subprocess 4009 Killing subprocess 60546 Killing subprocess 60547 Main process received SIGTERM, exiting Killing subprocess 38455 Killing subprocess 38456 Killing subprocess 65136 Killing subprocess 31566 Killing subprocess 31567 Killing subprocess 31568 Main process received SIGTERM, exiting Killing subprocess 14739 Killing subprocess 62250 Killing subprocess 62251 Killing subprocess 31604 Killing subprocess 4010 Main process received SIGTERM, exiting Killing subprocess 38457 Main process received SIGTERM, exiting Killing subprocess 65137 Killing subprocess 14740 Killing subprocess 14741 Killing subprocess 62252 Main process received SIGTERM, exiting Killing subprocess 31605 Killing subprocess 65138 Killing subprocess 65139 Main process received SIGTERM, exiting Killing subprocess 14743 Main process received SIGTERM, exiting Killing subprocess 31606 Killing subprocess 31607 Main process received SIGTERM, exiting Killing subprocess 3548 Killing subprocess 54160 Killing subprocess 3549 Killing subprocess 3550 Killing subprocess 54161 Killing subprocess 54162 Killing subprocess 54164 Main process received SIGTERM, exiting Killing subprocess 33462 Killing subprocess 37254 Killing subprocess 62641 Killing subprocess 3552 Main process received SIGTERM, exiting Killing subprocess 33463 Killing subprocess 33464 Killing subprocess 78252 Killing subprocess 37255 Killing subprocess 37256 Killing subprocess 62642 Killing subprocess 62643 Killing subprocess 62644 Killing subprocess 33465 Main process received SIGTERM, exiting Killing subprocess 78253 Killing subprocess 78254 Killing subprocess 78255 Killing subprocess 37257 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 71588 Killing subprocess 52835 Killing subprocess 66284 Main process received SIGTERM, exiting Killing subprocess 71589 Killing subprocess 52836 Killing subprocess 66285 Killing subprocess 71590 Killing subprocess 71591 Main process received SIGTERM, exiting Killing subprocess 73370 Killing subprocess 52837 Killing subprocess 52838 Main process received SIGTERM, exiting Killing subprocess 66286 Killing subprocess 66287 Main process received SIGTERM, exiting Killing subprocess 70128 Killing subprocess 73371 Killing subprocess 73372 Killing subprocess 76744 Killing subprocess 73373 Main process received SIGTERM, exiting Killing subprocess 70129 Killing subprocess 70130 Killing subprocess 70132 Main process received SIGTERM, exiting Killing subprocess 76745 Killing subprocess 76746 Killing subprocess 76748 Main process received SIGTERM, exiting Killing subprocess 42114 Killing subprocess 42115 Killing subprocess 42116 Killing subprocess 42117 Main process received SIGTERM, exiting Killing subprocess 22439 Killing subprocess 22440 Killing subprocess 22441 Killing subprocess 22442 Killing subprocess 6741 Main process received SIGTERM, exiting Killing subprocess 6742 Killing subprocess 6743 Killing subprocess 6744 Killing subprocess 27342 Main process received SIGTERM, exiting Killing subprocess 4903 Killing subprocess 27343 Killing subprocess 7749 Killing subprocess 4904 Killing subprocess 4905 Killing subprocess 4906 Main process received SIGTERM, exiting Killing subprocess 27344 Killing subprocess 27345 Main process received SIGTERM, exiting Killing subprocess 7750 Killing subprocess 7751 Killing subprocess 7753 Main process received SIGTERM, exiting Killing subprocess 78894 Killing subprocess 78895 Killing subprocess 78896 Killing subprocess 78897 Main process received SIGTERM, exiting Killing subprocess 24072 Killing subprocess 7177 Killing subprocess 24073 Killing subprocess 7178 Killing subprocess 24074 Killing subprocess 24075 Main process received SIGTERM, exiting Killing subprocess 7179 Killing subprocess 7180 Main process received SIGTERM, exiting Killing subprocess 78710 Killing subprocess 78711 Killing subprocess 78712 Killing subprocess 78713 Main process received SIGTERM, exiting Killing subprocess 66743 Killing subprocess 66744 Killing subprocess 66745 Killing subprocess 66751 Main process received SIGTERM, exiting Killing subprocess 65099 Killing subprocess 65100 Killing subprocess 65101 Killing subprocess 65103 Main process received SIGTERM, exiting srun: Job step aborted: Waiting up to 62 seconds for job step to finish. ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja ninja.................. [OKAY].................. [OKAY] -------------------------------------------------- -------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES] ..................... [YES][OKAY] ...... [OKAY] fused_adam fused_adam............. .............[NO] .......[NO] [OKAY] ....... [OKAY] fused_lamb fused_lamb............. [NO]............. .......[NO] [OKAY] ....... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY] [OKAY] transformer ............transformer [NO]............ .......[NO] [OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer. [NO] ........ [OKAY][NO] ....... [OKAY] ninjaninja .................. ..................[OKAY] [OKAY]-------------------------------------------------- --------------------------------------------------op name ................op name installed................ ..installed compatible.. compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adam .............fused_adam [NO]............. .......[NO] [OKAY]....... [OKAY] fused_lamb fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformer transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalled installed .. .... .. compatible compatiblecompatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam.............................. ..............................[YES][YES] [YES][YES]............ [OKAY]............[OKAY] [OKAY][OKAY] fused_adam fused_adam.............fused_adamfused_adam .............[NO].......................... [NO].......[NO][NO] .......[OKAY].............. [OKAY][OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[NO] fused_lamb ............. ................................. [NO][OKAY][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attnsparse_attn [OKAY]sparse_attn........................ [NO][NO]............ transformer .............. [NO] ............ [OKAY] [OKAY]....... [NO] [OKAY].......transformer transformer [OKAY] ............ ............ [NO]transformer[NO] .......stochastic_transformer................... [OKAY][NO][OKAY] . .......[NO] [OKAY]stochastic_transformer .......stochastic_transformer [OKAY]. . stochastic_transformer [NO] [NO] ............... [OKAY][NO][OKAY] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ sparse_attninstalled .............. [NO]compatible .......-------------------------------------------------- [OKAY] transformer ............ [NO] cpu_adam....... ...............[OKAY] [YES] ...... stochastic_transformer[OKAY] . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op nameop nameop name................ ................ ................ ................installedinstalled installedinstalled.... .. ..compatiblecompatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]...............cpu_adamcpu_adam ....................................[YES] [YES] [OKAY] ...... ...... [YES] [OKAY] fused_adam[OKAY] ................... [OKAY][NO] ....... [OKAY] fused_lamb fused_adam............. fused_adam .............fused_adam [NO] ............. [NO] .................... [NO] [OKAY]....... [NO].......[OKAY] [OKAY] ....... [OKAY]fused_lamb fused_lamb ............. .............fused_lambsparse_attn[NO] [NO]................................ [NO][NO].......[OKAY] [OKAY].............. [OKAY][OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn sparse_attn............stochastic_transformer sparse_attn [NO] ......................... .......[NO] [NO] [OKAY][NO] ....... ....... ....... [OKAY]transformer[OKAY] [OKAY]............ transformertransformer[NO] ............................... [NO][NO][OKAY] .............. [OKAY][OKAY]stochastic_transformer . stochastic_transformer[NO]stochastic_transformer ....... .[OKAY] . [NO] [NO]....... .......[OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] [OKAY]------------------------------------------------------------------------------------------------------------------------------------------------------ op name-------------------------------------------------- op name................op name ................op name installed................ installed .................. installed compatibleinstalled .. --------------------------------------------------....compatible compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adam[OKAY]...............cpu_adam ...............[YES]............... [YES] ...... ...... [YES] [OKAY]fused_adam [OKAY] ...... ............. [OKAY][NO] .......fused_adam [OKAY]............. [NO]fused_adam fused_lamb....... fused_adam............. .............[OKAY]............. [NO] [NO] [NO]..............fused_lamb .......[OKAY][OKAY] ............. [OKAY][NO] fused_lamb....... .............fused_lamb[OKAY] ............. [NO] sparse_attn [NO] ....... ............ .......[OKAY] [NO] [OKAY] .......sparse_attn [OKAY]............ [NO] .......transformer [OKAY]............ [NO]sparse_attnsparse_attn transformer....... ............ ........................ [OKAY] [NO] [NO][NO] .......stochastic_transformer.............. [OKAY] [OKAY] [OKAY] . [NO]transformerstochastic_transformer transformer....... ............[OKAY] .............[NO] [NO][NO]....... ..............[OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizerasync_io ............................. [NO][NO] .............. [OKAY][NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................. ..................[OKAY][OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................op nameop name ................................installed................ installed..installedinstalled compatible .... .. --------------------------------------------------compatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam [YES]cpu_adam.............................. [YES]......[YES] ............... ............[YES] [OKAY] [OKAY][OKAY]...... [OKAY] fused_adamfused_adam fused_adam .............fused_adam ............. ............. [NO]............. [NO][NO] ....... [NO] .............. [OKAY]....... [OKAY] [OKAY] [OKAY]fused_lamb fused_lambfused_lamb ............. fused_lamb ..........................[NO] ............. [NO] [NO]....... [NO] ....... ....... [OKAY].......[OKAY] [OKAY] [OKAY] sparse_attnsparse_attn sparse_attn ............sparse_attn ............[NO] ............ ................... [NO] [OKAY][NO] [NO] ....... ....... ....... [OKAY]transformer[OKAY] [OKAY]............ transformer[NO]transformer ...............................transformer [OKAY][NO]............ [NO] .......[NO] .......stochastic_transformer....... [OKAY] [OKAY] [OKAY]. [NO]stochastic_transformerstochastic_transformer stochastic_transformer....... . . [OKAY] [NO]. [NO] ....... [NO].......[OKAY] .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop nameop name ................ ................................ ................installedinstalledinstalled ..installed.. .. ..compatiblecompatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam cpu_adam [YES].............................. ......[YES]...............[YES] ............[YES] [OKAY]......[OKAY] [OKAY] [OKAY] fused_adam fused_adam............. fused_adam ............. fused_adam .............[NO][NO] .......[NO] ....... [OKAY] ....................[OKAY]fused_lamb [OKAY] ............. [NO]fused_lamb fused_lamb....................[NO] [NO] .................... .......[OKAY] [NO][OKAY] [OKAY] ....... [OKAY] fused_lamb ............. [NO]sparse_attn .......sparse_attnsparse_attn............ ............ ............ [NO] [NO] [NO]....... [OKAY] ....... [OKAY]....... [OKAY][OKAY]transformer ............ transformertransformer[NO] ............ ................... [NO][NO][OKAY] .......sparse_attn....... stochastic_transformer[OKAY] ............[OKAY]. [NO]stochastic_transformer .......stochastic_transformer[NO]. [OKAY] [NO] . ..............[NO] [OKAY][OKAY]....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop nameop name op name ................ ................................................ installed installed installedinstalled .. .. .. ..compatible compatible compatible compatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES] ...............cpu_adam............... .....................[YES][YES] [YES] [OKAY] ...... ............ [OKAY][OKAY][OKAY] fused_adam ............. [NO] .......fused_adamfused_adam fused_adam[OKAY]............. ..........................[NO] [NO]....... fused_lamb[NO] ....... .............[OKAY] ....... [OKAY] [NO] [OKAY] fused_lamb....... fused_lamb .............fused_lamb [OKAY] .......................... [NO] [NO] [NO] ....... .............. [OKAY][OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] transformersparse_attn sparse_attn sparse_attn ........................ ............ ............[NO] [NO][NO] [NO].............. .......[OKAY] [OKAY]....... [OKAY] [OKAY]transformer stochastic_transformertransformer............ transformer ............ [NO] ............. [NO] .......[NO] [NO][OKAY]....... ....... ....... [OKAY] [OKAY][OKAY]stochastic_transformer stochastic_transformer.stochastic_transformer [NO].. .......[NO][NO] .......[OKAY]....... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at transformer_inference .. [NO] ....... [OKAY] runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op nameop name................ op name ................ installed................ ................ installed ..installed installed.. compatible....compatible compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam cpu_adam[YES] ............... ............... ..................... [YES] [YES] [OKAY] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- fused_adam fused_adam ............. .............fused_lamb[NO]............. .......[NO].............[NO] [OKAY] .......[NO] ....... .......[OKAY][OKAY]fused_lamb op name ................op nameop name................ installed ................installed.................. ..compatibleinstalledinstalled compatible [OKAY].............fused_lamb ..--------------------------------------------------..-------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- op nameop nameop name op name................................ installedinstalled................ ................ .... installed installed compatiblecompatible.. fused_lamb [NO]............. ............. ....... [NO] [NO][OKAY] .............. [OKAY][OKAY]sparse_attn cpu_adam ...............cpu_adam [YES]............... cpu_adam ......cpu_adam [YES] ...............[OKAY] ............... ..-------------------------------------------------- -------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- ............ [NO] ....... [OKAY] ......[YES] [YES]......[OKAY] ......[OKAY] cpu_adam cpu_adam............... ...............cpu_adamcpu_adam[YES] ...............[YES]..................... ...... [OKAY][YES] sparse_attn ............transformer sparse_attn [NO]sparse_attn ............ ...............................[NO] [OKAY] [NO].......[NO] fused_adam[OKAY] ............. [NO] ....... [OKAY] [YES][OKAY] .......transformer[OKAY] ....... [OKAY] ............ fused_adam fused_adam.............fused_lamb fused_adam [NO].......................... ....................[NO][NO] [NO][OKAY].............. .......[OKAY][OKAY] fused_lamb [OKAY] ............ [OKAY][OKAY] stochastic_transformer[OKAY][NO] .............fused_lamb [NO]fused_lamb............. ....... ............. sparse_attn[NO] [OKAY] [NO] ................... ....... [NO] [OKAY] [OKAY] ....... fused_adam ............. [NO]fused_adam .................... fused_adam fused_adam[OKAY] [NO] transformer .transformer....... ............ [NO]............ [OKAY] .......[NO] [NO] [OKAY]....... [OKAY] ............. ............. ....... [NO]fused_lamb [NO] [OKAY] ............. ....... ....... [NO] [OKAY] [OKAY] fused_lamb....... .......stochastic_transformer [OKAY][OKAY] . [NO] .......stochastic_transformer stochastic_transformer [OKAY] sparse_attn transformer............ ............[NO]sparse_attnsparse_attn [NO]................... ............ [OKAY] [NO].......[NO] .......[OKAY]transformer ....... .............[OKAY] fused_lamb fused_lamb [NO] ............. .................... [NO][NO][OKAY] .. [NO][NO] .............. [OKAY][OKAY] [OKAY] ............ [OKAY]stochastic_transformer[NO] .............. sparse_attn[OKAY][OKAY] transformer........ transformer[OKAY][NO]............ ............ [NO] ....... [OKAY] ...................[NO] stochastic_transformer.......[OKAY][NO] sparse_attn ............transformer [NO]sparse_attnsparse_attn............ ....... ........................[OKAY][NO] ....... [NO][NO] transformer[OKAY]....... ....... ........ [OKAY] [NO] [OKAY] ....... [OKAY] ninjaninjaninjaninja .................. .................. .................. ..................[OKAY] [OKAY] [OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- ............[OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] --------------------------------------------------op name op nameop name op name ................................................ ................installedinstalledinstalled ..installed ....compatible ..compatiblecompatible-------------------------------------------------- stochastic_transformer[NO] .......transformer. transformer[OKAY] compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... cpu_adam...............[OKAY]cpu_adam ...............[YES]............... [YES]......[YES] ......[OKAY]...... [OKAY][OKAY]fused_adam ............[NO]............ .......[NO] stochastic_transformer[NO] [OKAY]....... . ....... [OKAY][NO] [OKAY]....... [OKAY] ............. [NO] ....... [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] fused_adam fused_lamb.............fused_adam fused_adam[NO].......................... .............[NO].......[NO] [NO] .....................[OKAY] [OKAY][OKAY][OKAY] fused_lamb ............. [NO]fused_lamb fused_lamb ................................. [OKAY][NO][NO]sparse_attn .......................... [OKAY][OKAY][NO] ....... [OKAY] transformersparse_attn ........................ [NO][NO] .......sparse_attn.......sparse_attn [OKAY] ............[OKAY] ............ [NO]stochastic_transformer [NO]transformer....... ....................[OKAY] [OKAY][NO][NO] ..............transformer transformer [OKAY][OKAY] ............ ............ [NO][NO] stochastic_transformer .............. .[OKAY][OKAY] [NO] ....... stochastic_transformer[OKAY] stochastic_transformer. [NO]. .......[NO] [OKAY] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop nameop name op name ................ ................................ ................installedinstalled ..installed..installed compatible.. compatible..compatible-------------------------------------------------- --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam...... ............... ............... [OKAY]............... [YES] [YES][YES]...... ............ [OKAY] [OKAY]fused_adam [OKAY] ............. [NO] ....... [OKAY] fused_lambfused_adam fused_adam fused_adam............. ............. ..........................[NO] [NO] [NO]..............[NO] [OKAY][OKAY].............. [OKAY][OKAY] fused_lamb ............. [NO]fused_lamb .......fused_lamb [OKAY] ............. sparse_attn............. [NO]............[NO] .......[NO]....... .......[OKAY][OKAY] [OKAY]sparse_attn ............ transformer[NO] ................... [NO][OKAY] ....... transformer[OKAY] sparse_attnsparse_attn ............ ............ ............stochastic_transformer [NO] [NO][NO]....... . .............. [NO] [OKAY][OKAY][OKAY] ....... [OKAY] stochastic_transformertransformertransformer ......................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name................ ................op name installed................ installed .................. installed ..compatible installed compatible..-------------------------------------------------- .. -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... cpu_adamcpu_adam[YES][OKAY] .................................... [YES][OKAY][YES] ............ [OKAY][OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] ............. [NO] fused_adamfused_adam.......fused_lamb ..........................[OKAY]............. [NO][NO] [NO]fused_lamb ....... ........................... [OKAY] [OKAY][NO] [OKAY] .......fused_lamb [OKAY]fused_lamb ............. .............[NO] [NO]....... .......[OKAY] sparse_attn [OKAY] ............ [NO]sparse_attn ................... [OKAY][NO] ....... sparse_attntransformer[OKAY] sparse_attn............ ............ ............[NO]transformer [NO] [NO] ................... ....... ....... [OKAY][NO][OKAY] [OKAY] ....... [OKAY]stochastic_transformer transformer transformer ......................... stochastic_transformer [NO][NO] .[NO] ....... .......[NO] .......[OKAY] [OKAY] [OKAY] ....... stochastic_transformer[OKAY] stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop name................ op name ................ ................ installedinstalled ................ installed .... installed .. compatible compatible compatible .. ------------------------------------------------------------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam............... ............... [YES].............................. [YES] ......[YES][YES]...... [OKAY] ...... ......[OKAY] [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... .............fused_adam fused_adam[OKAY]............. [NO].............[NO] fused_lamb....... [NO]....................[OKAY] .......[NO][OKAY] .......fused_lamb[OKAY] [OKAY]fused_lamb ............. .............[NO]fused_lamb .......[NO]............. [OKAY].......[NO] sparse_attn [OKAY]................... [NO][OKAY] ....... [OKAY] sparse_attn ............ transformer[NO] sparse_attn...................sparse_attn [OKAY][NO]........................ .......[NO]transformer [NO] .......[OKAY] ............ ....... [NO][OKAY] stochastic_transformer .......[OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report .transformer[OKAY] transformer[NO] ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ............................... [OKAY] [NO]stochastic_transformer [NO] ............... [OKAY][OKAY][NO] ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja ....... stochastic_transformer[OKAY] stochastic_transformer . . [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................ ................................ ................ installed installedinstalled installed .. .. .... compatiblecompatiblecompatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam.............................. [YES]..............................[YES] ......[YES]......[YES] [OKAY]......[OKAY]...... [OKAY][OKAY] fused_adamfused_adam .......................... [NO]fused_adam[NO] fused_adam .................... .......[NO] ............. [OKAY] [NO][OKAY]....... .......[OKAY] [OKAY]fused_lambfused_lamb fused_lamb.......................... fused_lamb ............. [NO][NO][NO]............. ..............[NO]....... [OKAY] [OKAY][OKAY]....... [OKAY] sparse_attnsparse_attnsparse_attn sparse_attn.................................... ............[NO][NO] [NO] [NO]..................... ....... [OKAY] [OKAY][OKAY][OKAY]transformer ............ transformertransformertransformer[NO] ........................................... [NO][NO][NO][OKAY] ..................... [OKAY][OKAY][OKAY]stochastic_transformer .stochastic_transformerstochastic_transformer stochastic_transformer [NO]. .........[NO] [NO][NO] [OKAY]....... ....... ....... [OKAY] [OKAY] [OKAY] ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................ ................ ................ installedinstalled installed installed.... .. compatible compatiblecompatible -------------------------------------------------- ..---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam ............................................. [YES][YES] [YES]cpu_adam...... ...... ............... [OKAY]......[OKAY] [YES] [OKAY] ...... [OKAY] fused_adam fused_adam.............fused_adam .............[NO]............. [NO]fused_adam[NO] .................... ....... [OKAY] [NO]....... [OKAY] .......[OKAY]fused_lamb [OKAY]fused_lamb ............. fused_lamb ............. [NO] ............. [NO]fused_lamb[NO]....... ........................... [OKAY] [NO][OKAY] [OKAY] ....... [OKAY] sparse_attnsparse_attnsparse_attn ........................ sparse_attn............[NO] [NO] [NO]............ ....... ....... [NO] [OKAY]....... [OKAY] ....... [OKAY][OKAY]transformer transformer ........................transformer transformer[NO][NO]............ .......................... [NO] [NO] [OKAY] .......[OKAY] ....... [OKAY][OKAY] stochastic_transformer stochastic_transformer ..stochastic_transformer stochastic_transformer [NO] [NO] ......... ....... [NO][OKAY] [NO] [OKAY].............. [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................ ................................ ................installed installed..installedinstalled ..compatible.... compatiblecompatible-------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adam cpu_adam............... ...... [YES]............... ............... ...... [OKAY][YES] [YES] [OKAY] ...... ...... [OKAY][OKAY] fused_adam ............. [NO] ....... fused_adamfused_adam[OKAY] fused_adam ............. .............fused_lamb............. [NO]............. [NO].......[NO][NO] ....... [OKAY] .............. [OKAY] [OKAY][OKAY] fused_lamb fused_lamb............. fused_lamb.............[NO] [NO] ........................... [NO] [OKAY] [OKAY] ....... [OKAY]sparse_attn ............ [NO] ....... [OKAY] transformer sparse_attnsparse_attn............ sparse_attn........................[NO] ...................[NO] [NO][OKAY] [NO]....... ..............stochastic_transformer [OKAY] [OKAY] [OKAY]. transformer[NO]transformertransformer ........................ ....... ............[NO][OKAY][NO] [NO] .............. .......[OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer .. . [NO] [NO] [NO].............. [OKAY][OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. utils .................. [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name ................ op name................ ................ installedinstalled................ ..installed..installed compatible compatible.. .. ----------------------------------------------------------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adamcpu_adam [YES] [YES]............... ...........................[YES] [OKAY][YES][OKAY] ...... ......[OKAY] [OKAY] fused_adam .............fused_adam [NO] .............fused_adamfused_adam....... ............. .............[OKAY][NO] [NO][NO]....... fused_lamb ....... ....... [OKAY]............. [OKAY] [OKAY] [NO] fused_lamb....... fused_lambfused_lamb ............. [OKAY]............. .............[NO] [NO][NO]....... ....... ....... [OKAY] [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............sparse_attntransformersparse_attn [NO]........................ ............ ....... [NO] [NO][NO] [OKAY] .............. ....... [OKAY] transformer[OKAY][OKAY] ............transformer stochastic_transformer[NO]............ transformer [NO].................... [OKAY].......[NO][NO] [OKAY].............. stochastic_transformer [OKAY][OKAY]stochastic_transformer. [NO] .stochastic_transformer....... [NO][OKAY] . ....... [NO][OKAY] ....... [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- torch version .................... 1.8.1 NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- torch cuda version ............... 11.1 NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op name op name op name................ op name................ ................ installed installed ................installed .. .. installed..compatible compatible..compatible-------------------------------------------------- --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam...............[YES]............... ...............[YES]......[YES] [YES]......[OKAY] ...... [OKAY]...... [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_adam fused_adam.............fused_adam fused_adam [NO] ............. .................................[NO] [NO][OKAY][NO]....... DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at ....... ....... fused_lamb [OKAY][OKAY] [OKAY] ............. runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja [NO] fused_lamb.......fused_lamb fused_lamb ............. .............[OKAY] ............. [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY]sparse_attn [OKAY]-------------------------------------------------- op name------------------------------------------------------------------------------------------------------------------------------------------------------ ................op name op nameop nameinstalled ................ .................. ................ installedcompatibleinstalled ..installed-------------------------------------------------- .. compatible .. sparse_attn ........................transformer............ ............[NO][NO][NO] [NO]....... ....... ....... .......[OKAY] [OKAY][OKAY][OKAY] compatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- transformertransformer transformerstochastic_transformer............ .........................[NO] [NO][NO][NO] ....... ..................... [OKAY] [OKAY] [OKAY][OKAY] cpu_adam ............... [YES] ...... cpu_adamcpu_adam[OKAY]cpu_adam ............... ............... ............... [YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] stochastic_transformerstochastic_transformerstochastic_transformer ... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam fused_adam ............. .............fused_lamb ............. [NO] [NO] .............[NO] ....... ....... [NO] ....... [OKAY] [OKAY] ....... [OKAY] [OKAY] fused_lambfused_lamb ..........................fused_lamb [NO][NO]............. ..............[NO] [OKAY][OKAY]....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformersparse_attn ............ ........................sparse_attn [NO][NO][NO] ................... .......[OKAY] ....... [OKAY] [NO] [OKAY] ....... stochastic_transformer[OKAY]transformertransformer ......................... transformer [NO][NO][NO] ............ .............. [NO]....... [OKAY] [OKAY] .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer .. . [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY] ninjaninjaninjaninja .................................... ....................................[OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name op name op name................................op name ................installed installed ................ ..installed .. compatibleinstalled.. compatible..--------------------------------------------------compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam..................... cpu_adam ...............[OKAY] [YES] .....................[YES] [OKAY] [YES] ......fused_adam ......[OKAY]............. [OKAY][NO]fused_adam ....... .............[OKAY] fused_adam[NO] fused_lamb ................................. fused_adam [OKAY][NO].............[NO] [NO] ....... ....... .......fused_lamb [OKAY][OKAY] .............[OKAY] [NO]fused_lamb fused_lamb ....... ............. ............. [OKAY] [NO] [NO]sparse_attn .......................... [OKAY][OKAY] sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformer ............transformer sparse_attn sparse_attn[NO] ........................................... [NO][OKAY][NO][NO] ....... .......stochastic_transformer[OKAY] ....... [OKAY] [OKAY]. transformer [NO] transformer ................... stochastic_transformer [NO] ............. [OKAY] [NO]....... [NO] .......[OKAY]....... [OKAY] [OKAY] stochastic_transformer .stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name-------------------------------------------------- ................................ op name op nameinstalledinstalled .................... ................ compatibleinstalled compatible -------------------------------------------------- installed ..-------------------------------------------------- ..compatible compatible-------------------------------------------------- --------------------------------------------------cpu_adam ...............cpu_adam [YES]............... ......[YES]cpu_adam ......[OKAY] ............... [OKAY]cpu_adam [YES]............... ......[YES] [OKAY]...... [OKAY]fused_adam fused_adam............. .............[NO] [NO]....... fused_adam .......[OKAY]fused_adam ..........................[OKAY] [NO][NO]fused_lamb ..............fused_lamb............. .............[OKAY][NO][OKAY] [NO]....... .......[OKAY]fused_lambfused_lamb [OKAY] ............. ............. [NO] ....... [OKAY] [NO] ....... [OKAY]sparse_attn sparse_attn............ ............[NO] [NO]sparse_attn....... ....... ............[OKAY][OKAY] sparse_attn [NO] ............transformer....... transformer [NO] [OKAY]............ ............ ....... [NO] [NO] transformer.......[OKAY] .......[OKAY]............ transformer[OKAY] [NO]............ .......stochastic_transformer[NO] [OKAY]stochastic_transformer ........ [NO][OKAY] .stochastic_transformer....... [NO][OKAY] ........stochastic_transformer [OKAY][NO] . .......[NO] [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name op nameop name................ op name ................................installed installed................ installed.. installed .... compatible compatible.. compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam ............... .................................... [YES] [OKAY] [YES][YES]...... ...... ......[OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] .............fused_adamfused_adam [NO]..........................fused_lamb ....................[NO][NO] [NO][OKAY].............. [OKAY][OKAY]....... [OKAY]fused_lamb .............fused_lambfused_lamb [NO] ............. ............. ....... [NO] [NO] [OKAY]sparse_attn ....... ....... ............[OKAY][OKAY] [NO] ....... [OKAY] sparse_attn ............transformer [NO]............ .......sparse_attn [NO] [OKAY]sparse_attn................... [NO][OKAY]............ transformer ....... [NO] stochastic_transformer [OKAY] ....... ............ . [OKAY] [NO] [NO]....... transformertransformer [OKAY]....... ........................ [OKAY] [NO]stochastic_transformer [NO]....... ........[OKAY] [NO][OKAY] ....... [OKAY]stochastic_transformer stochastic_transformer .. [NO] [NO]....... [OKAY]....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................ ................................................installed installedinstalledinstalled.. .. ....compatible compatiblecompatible-------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES] ............... ......cpu_adam ...............[YES] [OKAY] .....................[YES] [OKAY]...... [YES] [OKAY]......fused_adam [OKAY]............. [NO]fused_adam .................... [OKAY][NO] fused_adam ....... fused_lamb[OKAY]............. fused_adam.............[NO]fused_lamb [NO]................................. .......[NO] [NO][OKAY].......[OKAY] .......[OKAY] [OKAY]fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] sparse_attn....... sparse_attn ............ [OKAY]............ [NO] [NO]....... .......[OKAY] sparse_attn[OKAY] transformer............ ............transformer[NO] sparse_attn [NO]................... ............[NO] ....... [OKAY][NO][OKAY]....... .......[OKAY] transformer[OKAY] stochastic_transformer............ stochastic_transformer[NO]transformer. ....................[NO] [NO] [OKAY] .......[NO] ....... [OKAY].......[OKAY] stochastic_transformer [OKAY] . [NO] stochastic_transformer....... [OKAY] . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name op name................ ................ ................installed installed................ installed .. installed.. ..compatiblecompatible.. --------------------------------------------------compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]cpu_adam......cpu_adam .....................[OKAY]............... [OKAY] [YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] fused_adam.............. [OKAY][OKAY]fused_adam............. fused_lamb.............[NO] fused_lamb[NO]............. ....................[NO]....... [NO] [OKAY].......[OKAY] .......[OKAY] fused_lamb[OKAY] fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY]sparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn transformersparse_attntransformer ............ ........................ ............[NO] [NO] [NO] [NO] ............................ [OKAY] [OKAY][OKAY] [OKAY]stochastic_transformertransformer stochastic_transformer ............. transformer. [NO] [NO] ............[NO] ....... .............. [NO][OKAY] [OKAY] [OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name op name ................ ................op name ................ installedinstalledinstalled................ .. .. installedcompatiblecompatible.. .. compatible----------------------------------------------------------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES]cpu_adam [YES].....................cpu_adam ............... [OKAY]...... [YES] [YES] [OKAY] ...... ...... [OKAY] [OKAY] fused_adam ............. fused_adam[NO] .......fused_adam ............. [OKAY] .............fused_adam [NO] [NO].............fused_lamb....... .......[NO] ............. [OKAY][OKAY] ....... [NO] [OKAY].......fused_lambfused_lamb .............[OKAY]fused_lamb............. [NO] .............[NO] ..............[NO] [OKAY][OKAY]....... sparse_attn ............ [NO][OKAY] ....... [OKAY] transformer sparse_attn............sparse_attn [NO]........................ .......[NO][NO] sparse_attn [OKAY].............. ............[OKAY][OKAY] [NO] stochastic_transformertransformer .......transformer............. [OKAY] [NO] ............[NO] transformer.......[NO]....... [OKAY]....... [OKAY]............ [OKAY]stochastic_transformer[NO] ....... .[OKAY] stochastic_transformer [NO] ........ [OKAY][NO] stochastic_transformer....... [OKAY] . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op nameop nameop name ................ ................ ................................ installed installed installedinstalled .. ....compatible.. DeepSpeed general environment info: compatible -------------------------------------------------- compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adamcpu_adam ...... ............... ..............................[OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] [YES][YES][YES] .................. [OKAY][OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] torch version .................... 1.8.1 fused_adamfused_adamfused_lamb .............fused_adam.......................... .............[NO][NO][NO] ....... [NO].............. [OKAY][OKAY].......[OKAY] torch cuda version ............... 11.1 [OKAY] nvcc version ..................... 11.2 fused_lamb fused_lamb............. .............[NO] fused_lamb [NO] ....... ............. ....... [OKAY] [NO]sparse_attn[OKAY] ................... [NO][OKAY] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer sparse_attnsparse_attn............ ........................ [NO] [NO] sparse_attn[NO] ....... ..........................[OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [OKAY][OKAY][NO] ....... stochastic_transformertransformer[OKAY]transformer ............ .............[NO]transformer ....... [NO]............ [NO] [OKAY]..............[NO] [OKAY].......[OKAY] stochastic_transformer[OKAY] stochastic_transformer. [NO]stochastic_transformer . ....... .[OKAY][NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................op name................op name installed................installed................ ....installedinstalled compatible .. compatible.. --------------------------------------------------compatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adamcpu_adamcpu_adam[YES] ................................................... [YES][YES][YES][OKAY] .................. [OKAY][OKAY][OKAY] fused_adam .............fused_adamfused_adamfused_adam .............[NO].......................... .......[NO][NO][NO] ....... [OKAY] ....... [OKAY]....... [OKAY] [OKAY] fused_lamb fused_lambfused_lamb............. fused_lamb ............. .............[NO] ............. [NO][NO].......[NO] .....................[OKAY] [OKAY][OKAY][OKAY] sparse_attnsparse_attnsparse_attnsparse_attn ................................................ [NO][NO][NO][NO] .............. .............. [OKAY][OKAY][OKAY][OKAY] transformertransformertransformertransformer ................................................ [NO][NO][NO][NO] ....... .............. .......[OKAY][OKAY] [OKAY] [OKAY] stochastic_transformerstochastic_transformerstochastic_transformer stochastic_transformer .... [NO][NO][NO][NO] ............................ [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninja ninja ...................................................... [OKAY] .................. [OKAY][OKAY] --------------------------------------------------[OKAY] ---------------------------------------------------------------------------------------------------- op name--------------------------------------------------op name op name ................................ op name ................installed................ installed installed ..installed.. compatible.... compatible -------------------------------------------------- compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... cpu_adam............... [YES][YES] [YES] ..................... [YES]......[OKAY] ...... ......[OKAY] [OKAY][OKAY] fused_adam ............. fused_adam[NO] fused_adam.............fused_adam....... .............[NO].............[OKAY] .......[NO][NO] [OKAY].............. fused_lamb [OKAY] fused_lamb[OKAY]............. .............fused_lamb [NO] fused_lamb[NO] ............. ........................... [NO][OKAY][OKAY] [NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO]sparse_attn [NO]....... sparse_attn ....... [OKAY] ........................ [OKAY] [NO] [NO] transformer ....... .......[OKAY]transformer............  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY][NO] ............transformer....... transformer [NO]............ [OKAY] ...................  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`................ [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....... [NO] [NO] [NO][OKAY]stochastic_transformer .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY]async_io async_io............... [NO]............... .......[NO] utils[NO]....... ..................[NO] ....... .[OKAY] stochastic_transformer [OKAY] [NO] async_io ............... [NO] ....... [NO] [YES] ...... [OKAY] quantizer transformer_inference.............. ..[NO] transformer_inference[NO]....... .........[OKAY] ........stochastic_transformer stochastic_transformer [OKAY][NO] ......... [NO][NO][OKAY] async_io ............... [NO] ....... [NO] [NO][OKAY] ....... [OKAY]-------------------------------------------------- .............. [OKAY][OKAY] transformer_inference .. [NO] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... transformer_inference .. [NO]utils ......................... [OKAY][YES] [OKAY] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- ninjaninja .................................... [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- op name op name................ ................installed installed.. ..compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... [YES]............... ...... [OKAY][YES] ...... [OKAY] fused_adam ............. [NO] fused_adam....... [OKAY]............. [NO] fused_lamb....... ............. [OKAY][NO] ....... [OKAY]fused_lamb  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] sparse_attn transformer............ ............ [NO][NO] .............. [OKAY][OKAY] transformer_inference .. [NO] ....... [OKAY] transformerstochastic_transformer ............ .[NO] [NO] .............. [OKAY][OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 ------------------------------------------------------------------------------------------------------------------------------------------------------ op name utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 op nameop nameop name ................ ................................................ installedinstalled installed..installed.. ..compatiblecompatible.. compatible quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science cpu_adam cpu_adam............... cpu_adam cpu_adam...............[YES] .....................[YES] [OKAY]...............[YES]...... [YES] ......[OKAY]...... [OKAY][OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. [NO] ....... fused_adam[OKAY] fused_adam .............fused_adam [NO].......................... fused_lamb .......[NO] ............. [NO][OKAY] .......[NO]....... fused_lamb.......[OKAY][OKAY] ............. [OKAY] [NO]fused_lambfused_lamb .................... ............. [OKAY] [NO] [NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ transformer[NO]sparse_attn sparse_attn ........................................... [NO] [NO][OKAY] [NO] async_io ............... [NO] ....... [NO] ....... ....... .......transformer [OKAY] ............[OKAY] [OKAY] [NO] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer....... transformer transformer [OKAY] ......................... [NO] [NO].......[NO] stochastic_transformer[OKAY] ....... utils .................. [YES] ...... [OKAY] ....... [OKAY][OKAY] . [NO] .......stochastic_transformer [OKAY]stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... [OKAY][OKAY]..................[OKAY] --------------------------------------------------[OKAY]-------------------------------------------------- -------------------------------------------------- op name -------------------------------------------------- op nameop name................ ................................installedop name installedinstalled .. .................. .. compatibleinstalledcompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------.. compatible -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... ..............................cpu_adam[YES] [YES][YES]............... ...... ...... [YES] ......[OKAY] [OKAY] ......[OKAY] [OKAY] fused_adam .............fused_adam fused_adam[NO] ............. fused_adam............. ....... [NO] [NO] [OKAY].................... .......[NO][OKAY] fused_lamb [OKAY] ....... ............. fused_lamb [OKAY] fused_lamb [NO]............. fused_lamb....................[NO] .............[OKAY].......[NO] [OKAY].......[NO] .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn ............ transformersparse_attn[NO] sparse_attn ............ ............................... [NO][OKAY] [NO] .......[NO] .......[OKAY] ....... transformer[OKAY] [OKAY]............ stochastic_transformer transformertransformer[NO] . ................... ............ [NO] [NO] [OKAY] [NO].............. .......[OKAY][OKAY] stochastic_transformer  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] async_io ............... [NO] ....... [NO] .stochastic_transformer [NO].stochastic_transformer .......[NO] ........[OKAY] [NO][OKAY] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninja ninja .................................... .................. .................. [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op name op name................ ................ ................ ................installed installedinstalledinstalled.. ....compatible.. compatiblecompatiblecompatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adamcpu_adamcpu_adam ............... .............................. ...............[YES] [YES] [YES] ............[YES]...... [OKAY] [OKAY] ...... [OKAY][OKAY] fused_adamfused_adam ............. fused_adam............. fused_adam .............[NO].............[NO] [NO].......[NO] ....... .......[OKAY]....... [OKAY][OKAY][OKAY] fused_lamb fused_lamb.............fused_lamb fused_lamb............. ............. [NO]............. [NO] [NO] ....... [NO] .............. [OKAY] ....... [OKAY][OKAY] [OKAY] sparse_attnsparse_attnsparse_attn ............sparse_attn ............[NO]............ ............ [NO] [NO]....... [NO] .............. [OKAY] ....... [OKAY][OKAY] [OKAY]transformer ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name transformer............transformer transformer ........................ [NO] ............ [NO][NO] ....... ....... [OKAY][NO] ....... [OKAY] [OKAY]....... stochastic_transformer[OKAY] stochastic_transformer op name op name................ op name ................ ................installed ................ installedinstalled.. installed ..compatible.. .. compatible-------------------------------------------------- compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- stochastic_transformer . .stochastic_transformer[NO]. [NO]....... [NO]. ....... [NO][OKAY] ....... [OKAY]....... cpu_adamcpu_adam .............................. cpu_adam[YES]cpu_adam ............... ......[YES] ............... [YES]...... [OKAY][OKAY] [YES] ...... [OKAY] [OKAY] ......[OKAY] [OKAY] fused_adam ............. fused_adam[NO] ....................fused_adamfused_adam [OKAY] [NO]............. ............. .......[NO][NO]fused_lamb [OKAY] ............. ....... ....... fused_lamb[NO] [OKAY][OKAY]............. ....... [NO]fused_lamb[OKAY] .......fused_lamb ..........................[OKAY] [NO][NO] ....... .......[OKAY] [OKAY]sparse_attn ............ [NO] ....... sparse_attn[OKAY] ............ [NO]transformer .......sparse_attn............ sparse_attn [OKAY] [NO]........................transformer .......[NO]............[NO] [NO][OKAY]....... ....... ....... [OKAY] [OKAY] stochastic_transformer [OKAY] transformer. transformerstochastic_transformer ............ [NO][NO]............. ..............[NO][NO] [OKAY][OKAY]....... ....... [OKAY] [OKAY]stochastic_transformer . [NO]stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... [OKAY].................. [OKAY][OKAY] --------------------------------------------------[OKAY] ---------------------------------------------------------------------------------------------------- op name --------------------------------------------------op name op name ................ ................op name ................ installed installed................installed ......installed compatible compatible compatible.. -------------------------------------------------- ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam ...............cpu_adam............... ...............[YES] [YES] ............... [YES]............ [YES]......[OKAY][OKAY] ......[OKAY] [OKAY] fused_adamfused_adam fused_adam ............. fused_adam............. ............. [NO] [NO]............. [NO] ..............[NO] .......[OKAY][OKAY] ....... [OKAY]fused_lamb[OKAY] fused_lamb............. .............fused_lamb[NO] fused_lamb [NO] ....... ............. .............[OKAY]....... [NO][OKAY][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: sparse_attn ............sparse_attn [NO]sparse_attn ............sparse_attn....... [NO] ............ [OKAY]............[NO]....... [NO][OKAY]....... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer....... [OKAY] ............transformer torch version .................... 1.8.1 [OKAY] [NO]transformer torch cuda version ............... 11.1 ............ ...................[NO] transformer [OKAY]....... [NO] ............[OKAY] .......[NO] stochastic_transformer[OKAY].......stochastic_transformer nvcc version ..................... 11.2 .[OKAY]. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO]stochastic_transformer[NO] stochastic_transformer............... [OKAY] [OKAY] .[NO] .......[NO] [OKAY]....... deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY] ......quantizer [OKAY].............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [YES][OKAY] ...... [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. /bin/sh: line 0: type: git: not found -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils ..................async_io async_io [YES] .............................. ......[NO][NO] [OKAY].............. [NO][NO] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------transformer_inference transformer_inference.. ..[NO] [NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................................................................ installedinstalledinstalledinstalled .. .... .. compatible compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... [YES][YES].............................. ......[YES] ......[YES] ...... [OKAY][OKAY] [OKAY]...... [OKAY] fused_adamfused_adam fused_adam.............fused_adam............. .............[NO]............. [NO] .......[NO] [NO] ....... [OKAY] .......[OKAY] ....... [OKAY][OKAY] fused_lamb fused_lamb.............fused_lamb [NO].............fused_lamb............. ....... [NO]............. [NO] [OKAY]....... [NO].......[OKAY] .......[OKAY] [OKAY] sparse_attnsparse_attn ........................sparse_attn [NO] sparse_attn[NO] ............ ....... ...................[NO][OKAY] [NO] [OKAY] ....... ....... transformer[OKAY] transformer  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY]............ async_io ............... [NO] ....... [NO] ............ transformer[NO] transformer [NO] ....... ...................[OKAY]............ [OKAY][NO][NO] transformer_inference .. [NO] ....... [OKAY] ..............stochastic_transformer [OKAY]stochastic_transformer[OKAY] . [NO].stochastic_transformer .......stochastic_transformer[NO] . [OKAY] utils .................. [YES] ...... [OKAY] ........[NO] [OKAY][NO]....... .......[OKAY] [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name op name ................................................ op name installedinstalled installed .................... ..installed compatiblecompatible.. --------------------------------------------------compatible-------------------------------------------------- compatible-------------------------------------------------- cpu_adam ............... cpu_adam[YES] -------------------------------------------------- ...............cpu_adam...... [YES]...............[OKAY] ......[YES] ......[OKAY] [OKAY] fused_adam ............. [NO] ....... cpu_adam[OKAY]fused_adam fused_adam............. [NO]fused_lamb............. ...................................[NO] [OKAY][NO]....... ....... [OKAY] fused_lamb[YES][OKAY] sparse_attn ............. ..................[NO]fused_lamb [NO].................... .......[OKAY][OKAY][NO] [OKAY]....... [OKAY] transformer ............ [NO] sparse_attn....... ............[OKAY] [NO]fused_adam sparse_attn ....... stochastic_transformer ............ [OKAY] ............. .[NO][NO] .......transformer[NO] .......[OKAY]............ ....... transformer[OKAY][NO] [OKAY] ................... [NO] fused_lamb[OKAY]....... [OKAY] .............stochastic_transformer .stochastic_transformer[NO] .......[NO] ........ [NO][OKAY] ....... [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. [YES] ......quantizer [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] quantizer utils.............. ..................[NO] [YES] ............. [OKAY][OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................. ....................................[OKAY].................. [OKAY]-------------------------------------------------- [OKAY] [OKAY] --------------------------------------------------op name -------------------------------------------------- ................ --------------------------------------------------op nameop name installed ................ op name................ .. installed ................installed compatible .. installed..-------------------------------------------------- compatible..compatible --------------------------------------------------compatible-------------------------------------------------- cpu_adam-------------------------------------------------- ............... [YES]cpu_adam cpu_adam ...... cpu_adam .............................. [OKAY] ............... [YES][YES] [YES] ............ [OKAY]......[OKAY]fused_adam [OKAY] ............. [NO] ....... [OKAY]fused_adamfused_adam fused_adam..........................fused_lamb .............[NO][NO]............. .......[NO]....... [NO][OKAY] .......[OKAY]....... fused_lamb [OKAY][OKAY] .............fused_lamb .............[NO]fused_lamb [NO]....... .............sparse_attn....... [OKAY][NO]............[OKAY] [NO]....... ....... [OKAY][OKAY] sparse_attn ............transformer sparse_attn[NO]............ ...................[NO]sparse_attn [OKAY].......[NO]............ [NO][OKAY]....... transformer ....... [OKAY] ............stochastic_transformer [OKAY] [NO]transformer. transformer ................... [OKAY][NO][NO] ............ ..............[NO]stochastic_transformer .......[OKAY][OKAY] . [OKAY] [NO] stochastic_transformer....... stochastic_transformer[OKAY]. [NO]. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja .................. .................. .................. ..................[OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop name op name ................ ................op name ................ installedinstalled ................installed.... compatiblecompatible..installed --------------------------------------------------compatible ..-------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam.....................cpu_adam [OKAY]............... [YES]...............[YES] ......[YES] ...... [OKAY] ...... [OKAY]fused_adam [OKAY] ............. [NO] ....... [OKAY] fused_adam fused_lamb............. fused_adam.............fused_adam [NO] ............. .............[NO] [NO].......[NO] .......[OKAY]....... ....... [OKAY][OKAY] fused_lamb[OKAY] fused_lamb............. [NO].............fused_lamb .......[NO]............. sparse_attn[OKAY] ....... [NO]............ [OKAY] ....... [NO] [OKAY]....... [OKAY] sparse_attn ............transformer sparse_attn[NO]............ .......[NO]............ .......sparse_attn[OKAY] [NO] [OKAY] ................... transformer[OKAY][NO]stochastic_transformer .................... transformer[NO][OKAY][NO] .......................... transformer [OKAY] [OKAY] [NO]............ .......[NO]stochastic_transformer [OKAY]....... .[OKAY] stochastic_transformer[NO] ....... .[OKAY]stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- async_io ............... [NO] ....... [NO] DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. transformer_inference .. [NO] ....... [OKAY] JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... ....... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] /bin/sh: line 0: type: git: not found utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... utils[OKAY] transformer_inference .. [NO] ....... [OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY]quantizer .............. [NO] .......-------------------------------------------------- [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop nameop name ................................ ................................ installedinstalledinstalled installed .. .. .... compatiblecompatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... cpu_adam ............... [YES] .............................. ......[YES] ......[OKAY][YES] [YES] [OKAY]............ [OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adamfused_adam ..........................fused_lamb ..........................[NO] [NO] [NO][NO].............. ....... .......[OKAY][OKAY] [OKAY] [OKAY] fused_lamb fused_lamb............. fused_lamb.............[NO] .................... [NO][NO]sparse_attn[OKAY] .......................... [OKAY][NO][OKAY] ....... [OKAY] transformersparse_attn ........................ [NO]sparse_attn [NO] ....... sparse_attn............ ....... ............[OKAY] [NO][OKAY][NO] ....... stochastic_transformer[OKAY]....... transformer [OKAY]............ .transformer [NO][NO] ..........................transformer [NO][OKAY] [OKAY]................... stochastic_transformer[NO][OKAY] ....... .[OKAY] stochastic_transformer [NO] ........ stochastic_transformer [OKAY][NO] ....... .[OKAY] [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [YES] ...... [OKAY]quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 async_io ............... [NO] ....... [NO] .....................nvcc version 11.2..................... deepspeed install path11.2 transformer_inference .. [NO] ....... [OKAY] ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info utils .................. [YES] ...... [OKAY] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op nameop name ................ ................................installed................ .. installed installedinstalled compatible ......-------------------------------------------------- compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam[YES] .................................... ............... [YES][YES] [OKAY] [YES]...... ...... ......[OKAY][OKAY] [OKAY] fused_adam .............fused_adam fused_adam[NO] fused_adam ............. ....... .......................... [NO] [OKAY] [NO].......[NO] fused_lamb.......[OKAY] ....... ............. [OKAY] [OKAY][NO] fused_lamb ....................fused_lamb [OKAY][NO]fused_lamb............. ....................[NO] [OKAY] [NO] ....... .......[OKAY] [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............sparse_attnsparse_attn [NO]............transformer ............ ................... [NO] [OKAY] [NO][NO] ....... ....... transformer....... [OKAY]............[OKAY] [OKAY] [NO] transformer.......stochastic_transformer transformer[OKAY]............ . ............ [NO] [NO]stochastic_transformer[NO] ....... ............... [OKAY][NO][OKAY][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO]transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]transformer_inference utils ......... ..................[OKAY][NO] [YES]....... ......[OKAY] [OKAY]utils .................. [YES]quantizer utils ...... .............. .................. [OKAY] [NO] [YES] ............. [OKAY][OKAY] quantizer .............. quantizer--------------------------------------------------[NO] ..................... [NO][OKAY] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference .. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY]quantizer async_io ............... [NO] ....... [NO] .............. [NO]quantizer ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] torch cuda version ............... 11.1 async_ioutils ................................. [NO][YES] ............. [NO][OKAY] -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science quantizer .............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer quantizer .............. [NO] ....... [OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY] [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version torch version............... ....................11.1 1.8.1nvcc version ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op name ................op name................................ ................installed installed.. installedinstalled ..compatible .. .. compatible--------------------------------------------------compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam ............... [YES] ...............cpu_adam [YES] ...... [YES][OKAY]..................... ......[YES][OKAY] [OKAY]...... [OKAY] fused_adam ............. [NO] fused_adam .......fused_adam ............. [OKAY] ............. [NO] [NO].......fused_adamfused_lamb ............. .......[OKAY]............. [OKAY][NO][NO] fused_lamb..............fused_lamb [OKAY] .............[OKAY]............. fused_lamb[NO] [NO]............. .......[NO]....... [OKAY] ....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer sparse_attn ............ ........................[NO] [NO][NO]sparse_attn ....... ....... ....... [OKAY] [OKAY] [OKAY]............ transformer[NO] transformer............ ............[NO] stochastic_transformer....... [NO].......[OKAY]. ....... [OKAY] [NO] [OKAY] transformer .......stochastic_transformer stochastic_transformer[OKAY]............. [NO][NO] . ....... ....... [NO] [OKAY] [OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY]utils .................. [YES] ......utils [OKAY].................. [YES] ......quantizer [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] DeepSpeed general environment info: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- async_io ............... [NO] ....... [NO] op nameop name op nameop name ................ ................ ................................ installed installed installedinstalled .. .. .... compatible compatiblecompatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] cpu_adamcpu_adam cpu_adamcpu_adam ............... ............... ............... [YES]............... [YES] [YES][YES] ...... ...... ............ [OKAY] [OKAY][OKAY][OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- fused_adam .............fused_adamfused_adamfused_adam [NO]....................................... .......[NO][NO][NO] [OKAY]....... ....... ....... [OKAY] [OKAY] [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_lamb deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lambfused_lamb............. .............[NO].............fused_lamb .......[NO]............. [NO] .......[OKAY][NO]....... [OKAY].......[OKAY] async_io ............... [NO] ....... [NO] [OKAY] transformer_inference .. [NO] ....... [OKAY] sparse_attn sparse_attn............sparse_attn ............sparse_attn[NO] ...............................[NO] [NO] [OKAY] [NO]....... .............. [OKAY][OKAY]transformer[OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ............transformer transformertransformer[NO] ............ ............ ............ ....... [NO][NO] [NO] [OKAY]....... -------------------------------------------------- ....... ....... [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformer. stochastic_transformerstochastic_transformer[NO] . ........[NO]. [NO][OKAY].......[NO] .......[OKAY]....... [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO]async_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference transformer_inference.. ..[NO] utils.......[NO] ..................[OKAY]....... [YES] [OKAY]...... [OKAY] utils ..................utils quantizer [YES]................................ ......[YES][NO] [OKAY]...... ....... [OKAY][OKAY] quantizer .............. [NO]quantizer -------------------------------------------------- ....... .............. [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. [OKAY].................. [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name -------------------------------------------------- op nameop name ................ ................ op name................installed installed ................installed.... ..compatible installed compatible compatible--------------------------------------------------.. ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES] cpu_adam ..................... ..............................[OKAY] [YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adamfused_adam ............. fused_lamb .......................... [NO]............. [NO][NO] ....... [NO]....... ....... [OKAY]....... [OKAY][OKAY][OKAY] fused_lamb .............fused_lamb fused_lamb [NO] ............. ............. ....... [NO] sparse_attn[NO][OKAY] .......................... [NO][OKAY][OKAY] ....... [OKAY] sparse_attntransformer ........................ [NO][NO] .............. sparse_attn[OKAY] sparse_attn [OKAY] ............ ............transformer stochastic_transformer [NO]............[NO] [NO] ............... ....... [OKAY][OKAY][NO][OKAY] .......transformer transformerstochastic_transformer[OKAY] ........................ .[NO][NO] .......[NO]....... .......[OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info0.4.2+bc17042, bc17042, big-science ................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... async_io[OKAY] ............... [NO]-------------------------------------------------- ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op name ................op name................ ................................installedinstalled installed..installed .. compatible ..--------------------------------------------------..compatible compatiblecompatible -------------------------------------------------- -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] cpu_adamcpu_adam......cpu_adam ...............[OKAY].............................. [YES][YES][YES] .................. [OKAY][OKAY][OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... fused_adamfused_adam[OKAY] fused_adam transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils .......................... .............[NO] [NO] fused_lamb [NO]....... ....... .................... [OKAY] [OKAY] [NO][OKAY] .................. [YES] ...... [OKAY] .......fused_lamb [OKAY]fused_lamb.............  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb .............[NO]............. [NO][NO]....... ..............[OKAY] [OKAY] [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer sparse_attnsparse_attn............ sparse_attn ........................[NO] [NO] ............[NO]....... ....... [NO] [OKAY] [OKAY]....... async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] .......utils [OKAY].................. ....... [OKAY][OKAY] transformer_inference .. [NO] ....... [OKAY] [YES] ...... [OKAY] stochastic_transformertransformertransformer transformer ............ ............. ............ [NO] [NO] [NO][NO]....... ....... ....... .......[OKAY] [OKAY] [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] [OKAY] quantizer .............. [NO] ....... [OKAY] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] stochastic_transformer stochastic_transformer.stochastic_transformer [NO]. ........[NO] [NO][OKAY]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY][OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installed installedinstalled installed .... .. .. compatible compatiblecompatiblecompatible-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed general environment info: cpu_adam ............... cpu_adamcpu_adam[YES] cpu_adam ............... .................................... [YES][YES][YES][OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ...... ...... ...... [OKAY] [OKAY] [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. [NO]fused_adam fused_adamfused_adam ....... ............. .......................... [NO][OKAY][NO] .......[NO]....... fused_lamb.......[OKAY] [OKAY] ............. DeepSpeed general environment info: [OKAY] fused_lamb [NO]fused_lamb ....................fused_lamb............. [NO] [NO] .............[OKAY] ....... ....... [NO] [OKAY] [OKAY] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 sparse_attn ............sparse_attnsparse_attn sparse_attn [NO]........................ [NO]................... [NO] [NO] .......[OKAY] ....... ....... [OKAY] torch cuda version ............... 11.1 [OKAY][OKAY]transformer nvcc version ..................... 11.2 transformer............ transformer ............ transformer[NO] [NO]............................... .......[NO][OKAY][NO] [OKAY].............. [OKAY][OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: stochastic_transformerstochastic_transformer stochastic_transformerstochastic_transformer. . [NO] .. [NO] [NO] ....... [NO] .............. [OKAY] ....... [OKAY] [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] ......quantizer [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:DeepSpeed general environment info: torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science torch cuda versiontorch cuda version .............................. 11.111.1 deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... ...............11.1 11.1 ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... quantizer[OKAY] .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name op name --------------------------------------------------................op name DeepSpeed general environment info: ................op name................installed ................ ..installed installed installed ..compatible .. compatible--------------------------------------------------..compatible ----------------------------------------------------------------------------------------------------compatible torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1DeepSpeed general environment info: torch cuda version cpu_adam ............... cpu_adam[YES] cpu_adam..................... ............... cpu_adam[OKAY] [YES] ............... 11.1 [YES] ............... ...... ...... [YES] [OKAY] [OKAY]fused_adam...... .............[OKAY] [NO] ....... [OKAY]fused_adam nvcc version torch install path..................... ...............11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] fused_adam ..........................fused_lamb [NO]fused_adam[NO]............. ....... ....................[NO] [OKAY][OKAY].......[NO] [OKAY]....... deepspeed info ...................torch version 0.4.2+bc17042, bc17042, big-science.................... fused_lamb fused_lamb [OKAY] .......................... deepspeed wheel compiled w.1.8.1 ...... torch 1.8, cuda 11.1torch cuda version [NO][NO] fused_lamb.......sparse_attn....... ............[OKAY] ............. [NO] [OKAY] [NO] ............... 11.1 nvcc version ..................... 11.2 ....... .......[OKAY] [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformersparse_attn ........................ [NO][NO] sparse_attn ....... ....... sparse_attn[OKAY]............ [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ............[NO] [NO]transformerstochastic_transformer....... ....... ............ [OKAY] [OKAY] .[NO] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [NO]....... transformer.......transformer[OKAY] [OKAY] ............ ............[NO] [NO]....... stochastic_transformer ....... [OKAY] [OKAY]. [NO] .......stochastic_transformer stochastic_transformer [OKAY] . .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................. .................. [OKAY].................. [OKAY] [OKAY] [OKAY]---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name ................................op name installedinstalled................ ....--------------------------------------------------installed compatiblecompatible..op name compatible---------------------------------------------------------------------------------------------------- ................ --------------------------------------------------installed cpu_adam ...............cpu_adam.. [YES]...............cpu_adamcompatible ......[YES]............... [OKAY]...... [YES] --------------------------------------------------......[OKAY] [OKAY] fused_adam ............. [NO] .......fused_adam cpu_adam [OKAY]fused_adam ............. .............[NO] fused_lamb [NO]............... ....... ....................[YES] [OKAY] [OKAY] [NO] .............fused_lambfused_lamb [OKAY][OKAY]............. ............. DeepSpeed general environment info: [NO][NO] .............. [OKAY][OKAY] sparse_attn fused_adam............ [NO]sparse_attn sparse_attn ............................................ [NO][OKAY][NO][NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .......transformer.............. ............[OKAY][OKAY] [NO] [OKAY] transformertransformer....... ........................[OKAY]fused_lamb torch version .................... 1.8.1 ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name [NO][NO] .............. stochastic_transformer.............[OKAY][OKAY] torch cuda version ............... 11.1 ................op name ................op name installed ................ installedinstalled.................. ....installedcompatible [NO]. stochastic_transformer[NO]stochastic_transformer ............... .[OKAY][NO] [OKAY][NO]....... nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] compatiblecompatible -------------------------------------------------- ..-------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- .......[OKAY] [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ...............cpu_adam cpu_adam [YES] .................................... [YES]cpu_adam[YES][OKAY] ........................... [OKAY][YES][OKAY] ...... [OKAY]fused_adam sparse_attn ............ [NO] ....... [OKAY] ............. [NO]fused_adam fused_adam.................... [OKAY].............[NO] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam[NO]fused_lamb....... ....... ..........................[OKAY] [OKAY] [NO] [NO] .......fused_lamb fused_lamb [OKAY] .................... ............. [NO][NO] [OKAY] ....... ....... [OKAY][OKAY] sparse_attn fused_lamb ............ [NO]............. ....... [NO][OKAY] sparse_attn.......transformer sparse_attn ............ ............ [OKAY][NO]............[NO] ....... [NO].......[OKAY] [OKAY].......transformer ............[OKAY] stochastic_transformer[NO] ........sparse_attn transformer[OKAY][NO] ............................... stochastic_transformer[NO][OKAY][NO] ....... ........[OKAY] [OKAY][NO] .......stochastic_transformer [OKAY]transformer . [NO]............ ....... [NO][OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... DeepSpeed general environment info: torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version torch version............... ....................11.1 1.8.1 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ...........nvcc version ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inferenceasync_io .. ...............[NO] [NO]....... .......[OKAY] [NO] deepspeed infodeepspeed install path .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference quantizer.. [NO].............. .......[NO] [OKAY]....... [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 quantizer .............. [NO] ....... [OKAY] nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info -------------------------------------------------- deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name-------------------------------------------------- op name deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] op name op name ................ ................ ................installed................ installed..installedinstalled ....compatible.. compatible compatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ............... [YES]cpu_adamcpu_adamcpu_adam ..................... .............................. [OKAY] [YES][YES][YES] .................. [OKAY][OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_lamb fused_adamfused_adam .......................... ............. .............[NO][NO][NO] ....... [NO] ....... ....... [OKAY].......[OKAY] [OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[NO] ............. ............. .......sparse_attn[NO] [NO][OKAY]................... .......[OKAY][NO] .......[OKAY] [OKAY] transformersparse_attn ............ ............[NO]sparse_attn sparse_attn...................[NO] [NO]............[OKAY] ....... ....... [OKAY][NO]stochastic_transformer [OKAY] .......transformer . ............ [OKAY]transformer [NO] [NO]............ ....... transformer....... [NO] [OKAY] ................... [OKAY] [NO] [OKAY] ....... stochastic_transformer[OKAY] stochastic_transformer . .[NO]stochastic_transformer [NO]....... . ....... [OKAY] [NO][OKAY] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch version .................... 1.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ............... 11.1 nvcc version torch cuda version..................... ...............11.2 11.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ......utils [OKAY].................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- JIT compiled ops requires ninja deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] nvcc version ..................... 11.2 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op nameop nameop nameop name ................................................................ installedinstalledinstalled installed .. .. ..compatible.. compatible compatible --------------------------------------------------compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ...............cpu_adam cpu_adam[YES]...............cpu_adam [YES] ..................... ............... ......[OKAY][YES] [YES] [OKAY]...... ...... [OKAY][OKAY] async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] fused_adam ............. [NO]fused_adam fused_adam....... fused_adam.......................... [OKAY][NO].............[NO] .......[NO] .......fused_lamb[OKAY] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] ....................[OKAY]fused_lamb [NO] [OKAY]............. utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] fused_lamb[NO].......fused_lamb .................... [OKAY] ............. [OKAY][NO][NO] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] .............. [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- sparse_attn ............ sparse_attn[NO] ................... sparse_attn[NO] [OKAY]sparse_attn....... ............ ............transformer[OKAY][NO] [NO] ....... ............ ....... transformer[OKAY][OKAY][NO] ...................transformer transformer [OKAY][NO] ............ ............ ....... [NO]stochastic_transformer [NO] [OKAY] ....... ....... . [OKAY] [OKAY][NO] stochastic_transformer .......stochastic_transformer stochastic_transformer .[OKAY] .[NO] . [NO].......[NO] ....... [OKAY] ....... [OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op nameop name................op name ................................installed ................ installedinstalled .... installedcompatible..compatible ..--------------------------------------------------compatible-------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adam [YES][YES]cpu_adam............... ..................... ......[YES][OKAY] [YES] [OKAY] ...... ...... [OKAY][OKAY] fused_adam ............. fused_adam[NO] .............fused_adam fused_adam[NO]....... [OKAY]................................. [OKAY][NO] [NO]fused_lamb .......fused_lamb.................... [OKAY].............[NO] [OKAY] .......[NO] fused_lamb[OKAY] ....... .............[OKAY]fused_lamb [NO]............. .......[NO] [OKAY]....... [OKAY]sparse_attn ............ sparse_attn[NO] ................... [NO][OKAY]sparse_attn ....... ............transformer[OKAY]sparse_attn [NO]........................ transformer .......[NO] [NO] ................... [OKAY] [OKAY]....... [NO] [OKAY].......transformer stochastic_transformer [OKAY] ............. transformer [NO]stochastic_transformer [NO] ............ ........ ....... [NO][OKAY][NO][OKAY] ....... .......[OKAY] stochastic_transformer[OKAY] . [NO]stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed general environment info: ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name ................ op name................ ................ installed ................ installedinstalled .. installed.... compatiblecompatible..compatible compatible-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam cpu_adamcpu_adam...............cpu_adam ...............[YES].............................. ......[YES][YES][YES] [OKAY]...... ............[OKAY] [OKAY][OKAY] fused_adam fused_adam.............fused_adamfused_adam [NO]....................................... .......[NO][NO][NO] [OKAY].............. ....... [OKAY] [OKAY][OKAY] fused_lamb fused_lamb............. fused_lamb.............fused_lamb[NO] [NO].......................... ....... ....... [NO][NO][OKAY][OKAY] .............. [OKAY][OKAY] sparse_attnsparse_attn ............sparse_attn............ sparse_attn [NO][NO] ............ ............ ..............[NO][NO] [OKAY].......[OKAY]....... [OKAY] [OKAY] transformer transformer transformer ............ ............ transformer............ [NO] [NO] [NO] ............ .............. ....... [NO] [OKAY] [OKAY] [OKAY]....... [OKAY]stochastic_transformer stochastic_transformerstochastic_transformer . stochastic_transformer..[NO] [NO] .[NO] ....... ....... .......[NO][OKAY][OKAY] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op name................op name op name ................ installed................ ................ installed..installed installed....compatible torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................DeepSpeed general environment info: 1.8.1 ..compatiblecompatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam............... ..............................[YES]cpu_adam [YES]...............[YES]...... ......[YES][OKAY]...... torch cuda version ............... 11.1torch install path nvcc version............... ..................... 11.2 deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch version ....................................... 0.4.2+bc17042, bc17042, big-science1.8.1 [OKAY][OKAY]...... [OKAY] deepspeed wheel compiled w. torch cuda version...... ...............torch 1.8, cuda 11.1 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info:DeepSpeed general environment info: fused_adam ............. fused_adamfused_adam[NO] .............fused_adam.................... [NO] [NO]............. [OKAY] ....... deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ....... [NO] [OKAY] .......[OKAY]fused_lamb [OKAY]............. deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 fused_lamb fused_lamb[NO]............. fused_lamb....................[NO] .......[OKAY] [NO] ............. [OKAY] ....... [NO] [OKAY]....... torch cuda versiontorch cuda version .............................. 11.111.1 [OKAY] nvcc versionnvcc version .......................................... 11.211.2 sparse_attn ............sparse_attn [NO]............ .......[NO] sparse_attn[OKAY] sparse_attn ....... ............ ............ transformer[OKAY] [NO] ............[NO]....... transformer [NO] ...................[OKAY] deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science [OKAY].......[NO] transformer [OKAY] transformer....... deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ............ ............ stochastic_transformer [OKAY][NO][NO] ............... stochastic_transformer[NO] [OKAY] [OKAY] ........ [NO][OKAY] stochastic_transformerstochastic_transformer....... [OKAY]. . [NO] .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version DeepSpeed general environment info:............... 11.1 nvcc version ..................... torch install path11.2 deepspeed install path............... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']................... 0.4.2+bc17042, bc17042, big-science torch versiondeepspeed wheel compiled w. .......................... 1.8.1torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. [NO] ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] --------------------------------------------------....... [OKAY] /bin/sh: line 0: type: git: not found --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ...............DeepSpeed general environment info: 11.1 nvcc version ..................... 11.2 deepspeed install path torch install path........... ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference ..async_io [NO] utils...................... ..................[NO][OKAY] [YES]....... ......[NO] [OKAY]utils .................. quantizer[YES] .................... [NO][OKAY] .......transformer_inference [OKAY]quantizer.. ..............[NO] --------------------------------------------------[NO]....... .......[OKAY] [OKAY] --------------------------------------------------utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report quantizer .............. [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch version .................... 1.8.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninja ninja .................................... .................. .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- op name---------------------------------------------------------------------------------------------------- ................op name op nameop name installed .................................................. installedcompatible installed installed..-------------------------------------------------- ....compatible compatible compatible ----------------------------------------------------------------------------------------------------cpu_adam -------------------------------------------------- ............... [YES] ...... [OKAY] cpu_adamcpu_adam cpu_adam .............................. ...............[YES][YES]fused_adam [YES] ......................... ...... [OKAY] [NO] [OKAY][OKAY] ....... [OKAY] fused_lambfused_adam .......................... [NO]fused_adam[NO] fused_adam....... .................................[OKAY] [OKAY] [NO] [NO] .......fused_lamb....... [OKAY] ............. [OKAY] [NO]sparse_attnfused_lamb ...................fused_lamb [OKAY] .......................... [NO] [NO][NO]....... .......[OKAY]....... [OKAY][OKAY]sparse_attn transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] transformersparse_attnstochastic_transformersparse_attn ............ ............. ............ [NO][NO] [NO][NO] ....... ..................... [OKAY][OKAY][OKAY][OKAY] transformertransformerstochastic_transformer ......................... [NO][NO][NO] ..................... [OKAY][OKAY] [OKAY] stochastic_transformer . stochastic_transformer[NO] ........ [OKAY][NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name **** Git info for Megatron: git_hash=unknown git_branch=unknown **** op nameop name ................op name ................ ................installed................ installed..installedinstalled .. compatible.. .. compatible-------------------------------------------------- compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES] ......cpu_adam cpu_adam[OKAY]cpu_adam............... ..............................[YES] [YES] ......[YES]...... [OKAY]fused_adam...... [OKAY] .............[OKAY] [NO] ....... [OKAY] fused_adam ............. fused_adamfused_lamb[NO] fused_adam.......................... ....... .............[NO][OKAY] [NO] ....... [NO] fused_lamb.......[OKAY] ............. ....... [OKAY] [NO] [OKAY]....... fused_lamb[OKAY] /bin/sh: line 0: type: git: not found fused_lamb............. sparse_attn.............[NO] ............[NO]....... [NO] .......[OKAY].......sparse_attn [OKAY][OKAY]............ [NO] transformer....... ............[OKAY] [NO] sparse_attn....... [OKAY]transformer............ sparse_attn ............ [NO] ............stochastic_transformer [NO] [NO].............. . .......[OKAY][OKAY][NO] [OKAY]....... stochastic_transformer transformer [OKAY]transformer . ............ ............ [NO] [NO] [NO]....... ..............[OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path DeepSpeed general environment info: ...........deepspeed info ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc versiontorch install path ..................... 11.2............... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ................... torch version0.4.2+bc17042, bc17042, big-science ....................deepspeed wheel compiled w. 1.8.1...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] utils....... ..................[NO] [YES] ...... [OKAY] quantizer .............. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY]-------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found ...................deepspeed info ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .. [NO] ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op nameop name ................................ ................ ................installed installed installedinstalled .. .... .. compatiblecompatiblecompatible ----------------------------------------------------------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adamcpu_adam...............cpu_adam ............... ...............[YES]............... ......[YES][YES][YES] [OKAY] ...... ............ [OKAY][OKAY] [OKAY] fused_adam ............. fused_adamfused_adam[NO] fused_adam .................... ..........................[NO] [OKAY] [NO] /bin/sh: line 0: type: git: not found [NO]....... ....... ....... fused_lamb [OKAY] [OKAY][OKAY] ............. [NO] .......fused_lambfused_lamb fused_lamb [OKAY] ............. ............. ............. [NO] [NO] [NO] ....... ..............[OKAY] [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformersparse_attn............sparse_attn [NO].................................... [NO].......[NO][NO] ....... .......[OKAY] ....... [OKAY] [OKAY]transformer[OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ............ transformer[NO]stochastic_transformer transformer ................... . ............[NO][OKAY][NO] [NO].............. .......stochastic_transformer[OKAY] [OKAY] [OKAY] . [NO]stochastic_transformer stochastic_transformer ....... .[OKAY] . [NO] [NO]....... .......[OKAY] [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`................ [NO] ....... [NO] async_io ............... [NO] transformer_inference....... ..[NO] async_io[NO] ...................... [OKAY][NO] ....... [NO] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] transformer_inference quantizer.. ..............[NO]utils [NO]......................... .......[OKAY][YES] [OKAY]...... [OKAY] --------------------------------------------------utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-scienceDeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 transformer_inference .. [NO] ....... [OKAY] nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer utils.............. [NO].................. .......[YES] [OKAY]...... [OKAY] -------------------------------------------------- DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. [YES] quantizer...... [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science using torch.float16 for parameters ... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ------------------------ arguments ------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] accumulate_allreduce_grads_in_fp32 .............. False transformer_inference .. [NO] ....... [OKAY] adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 utils .................. [YES] ...... [OKAY] adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 quantizer .............. [NO] ....... [OKAY] biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True -------------------------------------------------- checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1271196.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 10 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version .................... 1.8.1.................... 1.8.1torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...............torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version DeepSpeed general environment info:............... 11.1 nvcc version ..................... 11.2torch install path deepspeed install path .......................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.torch version .......................... torch 1.8, cuda 11.11.8.1 DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc version .....................torch install path 11.2............... deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ................... torch version0.4.2+bc17042, bc17042, big-science .................... deepspeed wheel compiled w.1.8.1 ...... torch cuda versiontorch 1.8, cuda 11.1 ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version deepspeed install path............... ...........11.1 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] .....................deepspeed info 11.2................... deepspeed install path0.4.2+bc17042, bc17042, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ......quantizer [OKAY].............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ................... ................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info: torch version .................... 1.8.1 torch install path torch cuda version............... ............... 11.1 nvcc version ..................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.2 deepspeed install path torch version........... ....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 1.8.1 deepspeed info ...................torch cuda version 0.4.2+bc17042, bc17042, big-science ............... deepspeed wheel compiled w.11.1 ...... nvcc versiontorch 1.8, cuda 11.1 ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninja ninja ...................................................... .................. [OKAY][OKAY] [OKAY]--------------------------------------------------[OKAY]-------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name op name ................op name ................ ................................installedinstalled installedinstalled.. .. .. ..compatiblecompatiblecompatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES] cpu_adam............... ............... ...... ...............[YES][YES] [OKAY] [YES] ............ [OKAY]......[OKAY] fused_adam[OKAY] ............. [NO] ....... fused_adam[OKAY] fused_adam............. .............fused_adam fused_lamb .............[NO][NO] ............. [NO] .............. [NO] ....... [OKAY] [OKAY] ....... [OKAY] fused_lamb[OKAY]fused_lamb ............. fused_lamb ............. [NO] ............. [NO] .......[NO]....... sparse_attn .......[OKAY][OKAY] [OKAY]............ [NO] ....... [OKAY] transformer ............ [NO]sparse_attnsparse_attn sparse_attn............................... ............[NO] [NO][OKAY] [NO] ....... .......stochastic_transformer ....... [OKAY][OKAY]. [OKAY] transformer[NO] transformer ................... transformer[NO]............ [OKAY]............ .......[NO] [NO][OKAY]....... .......[OKAY] stochastic_transformer[OKAY] .stochastic_transformer stochastic_transformer[NO] ........ . [NO] [OKAY] .......[NO] [OKAY]....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version ............... 11.1 nvcc version ..................... 11.2 .....................deepspeed install path 11.2........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO] ....... [OKAY] -------------------------------------------------- [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO]transformer_inference ......... [NO][NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. transformer_inference[YES] ........ [NO][OKAY] ....... [OKAY] quantizer .............. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1DeepSpeed general environment info: nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install pathtorch install path ........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info ................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 0.4.2+bc17042, bc17042, big-science torch versiondeepspeed wheel compiled w. .......................... 1.8.1torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1torch install path torch cuda version............... ............... 11.1 nvcc version ..................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.2 deepspeed install pathtorch version ............................... 1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch cuda version .................................. 0.4.2+bc17042, bc17042, big-science11.1 deepspeed wheel compiled w.nvcc version ........................... torch 1.8, cuda 11.111.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.torch install path ...... torch 1.8, cuda 11.1............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install pathtorch install path ............... ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version torch version........................................ ....................1.8.11.8.1 1.8.1 torch cuda versiontorch cuda version torch cuda version ............... ............... ............... 11.1 11.1 11.1 nvcc versionnvcc version nvcc version ..................... ..................... ..................... 11.2 11.2 11.2 deepspeed install path deepspeed install path deepspeed install path ........... ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed infodeepspeed info................... ......................................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w.deepspeed wheel compiled w....... ............torch 1.8, cuda 11.1 torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc versionDeepSpeed general environment info: ..................... 11.2 deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...............deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info ...................0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninja ninja ...................................................... [OKAY][OKAY][OKAY] ..................---------------------------------------------------------------------------------------------------- -------------------------------------------------- [OKAY] op nameop name op name ................-------------------------------------------------- ................ ................ installed op nameinstalled installed .................. .... compatible installed compatiblecompatible ..-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam...............compatiblecpu_adam ...............--------------------------------------------------...............[YES] ......[YES] [OKAY]......[YES] [OKAY]...... [OKAY] cpu_adam ............... [YES] ......fused_adam [OKAY].............fused_adamfused_adam .............[NO]............. .......[NO][NO] [OKAY]....... ....... [OKAY][OKAY] fused_adamfused_lamb .......................... fused_lamb [NO][NO] fused_lamb ............. ...........................[NO] [OKAY][OKAY].......[NO] ....... [OKAY][OKAY] fused_lamb ............. [NO] ....... sparse_attn[OKAY] ............ [NO] .......sparse_attn sparse_attn ............ [OKAY]............ [NO][NO]transformer sparse_attn.......................... [OKAY][NO][OKAY] ............ .......transformer transformer[OKAY][NO]............ ............ [NO] .......[NO]stochastic_transformer....... ........[OKAY] [OKAY][NO] [OKAY] .......stochastic_transformer [OKAY]transformer stochastic_transformer . ............ .[NO] .......[NO][NO] [OKAY].............. [OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:nvcc version ..................... 11.2 deepspeed install path ...........torch install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']............... deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op nameop name ................................ installed................................installed ..installedinstalled.. compatible.... compatible compatible--------------------------------------------------compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... cpu_adam cpu_adam...............[YES] ..............................[YES]...... [YES][OKAY][YES]...... ............[OKAY] [OKAY][OKAY] fused_adam .............fused_adamfused_adam [NO].............fused_adam............. .......[NO].............[NO] [OKAY][NO]....... ....... ....... [OKAY] [OKAY] fused_lamb[OKAY] ............. fused_lamb[NO]fused_lamb fused_lamb .......................... ....... .............[NO][NO] [OKAY] [NO] .............. .......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] sparse_attnsparse_attn....... sparse_attn ............ ............[OKAY] ............ [NO][NO][NO] transformer..................... ............[OKAY][OKAY][OKAY] [NO] .......transformer transformer [OKAY] ............transformer ............ [NO][NO]............ stochastic_transformer....... ....... [NO] [OKAY] [OKAY] ........ [NO][OKAY] stochastic_transformer.......stochastic_transformer [OKAY]stochastic_transformer . . [NO][NO] . ....... ....... [NO] [OKAY][OKAY] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version DeepSpeed general environment info:............... 11.1 nvcc version ..................... 11.2torch install path deepspeed install path............... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ................... 0.4.2+bc17042, bc17042, big-science torch versiondeepspeed wheel compiled w. .......................... 1.8.1torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. transformer_inference[NO] ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................. .................. [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................................................installed installed installedinstalled .... .. ..compatible compatiblecompatible compatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... ...............[YES]...............[YES] [YES][YES] ........................ [OKAY][OKAY][OKAY] [OKAY] fused_adam fused_adam.............fused_adamfused_adam [NO] ....................................... ....... [NO] [NO][NO] [OKAY]..................... [OKAY][OKAY][OKAY]fused_lamb .............fused_lamb fused_lambfused_lamb [NO] .............................................. [NO][OKAY] [NO][NO]....... .......[OKAY]....... [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn sparse_attntransformer............sparse_attn ........................[NO]............ ....... [NO][NO][NO] [OKAY].............. ....... [OKAY] [OKAY] transformer[OKAY] ............transformerstochastic_transformer transformer [NO]............ ....................[NO] [NO][OKAY].......[NO] ....... [OKAY] ....... [OKAY] [OKAY]stochastic_transformer stochastic_transformer .stochastic_transformer .[NO] . .......[NO] [NO][OKAY]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizerasync_io ............................. [NO][NO] .............. [OKAY][NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............async_io[NO] [NO] ............................. [NO][NO][NO] ....... [NO] transformer_inferencetransformer_inference ..transformer_inference .. [NO] .. [NO] ....... [NO] ....... [OKAY] ....... [OKAY] [OKAY] utilsutils utils .................. .................. .................. [YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] quantizerquantizer quantizer............................ ..............[NO][NO] [NO].............. [OKAY][OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch install path .................... ...............1.8.1 torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version .....................torch version 11.2.................... deepspeed install path1.8.1 ........... torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']............... deepspeed info11.1 ...................nvcc version .....................0.4.2+bc17042, bc17042, big-science 11.2 deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > setting codecarbon ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lamb .............fused_lamb [NO]............. .......[NO] [OKAY]....... [OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY] ....... [OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO]ninja ....... ..................[OKAY] [OKAY] -------------------------------------------------- op name ................ installed .. compatiblesparse_attn --------------------------------------------------............ [NO] ....... [OKAY] transformercpu_adam ............ ...............[NO] [YES]....... ......[OKAY] [OKAY] stochastic_transformer . [NO] ....... fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path ...............torch version .................... 1.8.1 torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... 11.1 torch version nvcc version.................... .....................1.8.1 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version .....................deepspeed info 11.2................... deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op nameop name................ ................ op name installedinstalled................ .. .................. installed compatiblecompatible installed .. ---------------------------------------------------------------------------------------------------- .. compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] cpu_adamcpu_adam...... .....................[OKAY]............... [YES][OKAY][YES] ............ [OKAY][OKAY] fused_adam ............. [NO] .......fused_adam [OKAY]............. fused_adam fused_adam [NO] ..........................fused_lamb....... [OKAY][NO].............[NO] [NO]....... .......fused_lamb....... [OKAY][OKAY][OKAY]............. [NO] fused_lamb.......fused_lamb [OKAY].......................... [NO][NO] .............. sparse_attn[OKAY][OKAY] ............ [NO] ....... sparse_attn[OKAY] ............ [NO] transformer....... ............[OKAY]sparse_attn sparse_attn [NO]............ transformer................... [NO][NO]............[OKAY] ..............[NO] [OKAY]stochastic_transformer[OKAY] ....... [OKAY]. transformer transformer [NO] stochastic_transformer ............ .............[NO]....... [NO][NO][OKAY] ..................... [OKAY][OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- > initializing torch distributed ... > setting tensorboard ... DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 43 ... [2021-09-27 17:43:40,715] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.304 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 22.376 seconds time to initialize megatron (seconds): 67.410 [after megatron is initialized] datetime: 2021-09-27 17:44:03 building GPT model ... [2021-09-27 17:44:03,479] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-27 17:44:03,481] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-27 17:44:03,481] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.48 GB, percent = 20.0% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-27 17:44:04,887] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 [2021-09-27 17:44:06,136] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-27 17:44:06,136] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-27 17:44:06,137] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.66 GB, percent = 20.1% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-27 17:44:06,204] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-27 17:44:06,292] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-27 17:44:06,292] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-27 17:44:06,292] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-27 17:44:06,292] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-27 17:44:06,292] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-27 17:44:06,292] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-27 17:44:06,292] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-27 17:44:06,292] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-27 17:44:06,292] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-27 17:44:06,292] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-27 17:44:11,004] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-27 17:44:11,004] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-27 17:44:11,004] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-27 17:44:11,005] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-27 17:44:11,005] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-27 17:44:11,005] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] amp_params ................... False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] dump_state ................... False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] pld_params ................... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-27 17:44:11,007] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-27 17:44:11,007] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 196 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 154 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 104 successfully loaded 8 ZeRO state_dicts for rank 42 successfully loaded 8 ZeRO state_dicts for rank 199 successfully loaded 8 ZeRO state_dicts for rank 120 successfully loaded 8 ZeRO state_dicts for rank 205 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 134 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 67 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 133 successfully loaded 8 ZeRO state_dicts for rank 65 successfully loaded 8 ZeRO state_dicts for rank 208 successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 130 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 200 successfully loaded 8 ZeRO state_dicts for rank 152 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 75 loading 8 zero partition checkpoints for rank 196 successfully loaded 8 ZeRO state_dicts for rank 122 successfully loaded 8 ZeRO state_dicts for rank 153 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 57 successfully loaded 8 ZeRO state_dicts for rank 165 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 77 successfully loaded 8 ZeRO state_dicts for rank 190 successfully loaded 8 ZeRO state_dicts for rank 146 successfully loaded 8 ZeRO state_dicts for rank 36 successfully loaded 8 ZeRO state_dicts for rank 114 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 54 successfully loaded 8 ZeRO state_dicts for rank 98 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 58 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 103 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 34 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 41 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 38 successfully loaded 8 ZeRO state_dicts for rank 107 successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 53 loading 8 zero partition checkpoints for rank 96 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 164 successfully loaded 8 ZeRO state_dicts for rank 46 successfully loaded 8 ZeRO state_dicts for rank 73 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 106 successfully loaded 8 ZeRO state_dicts for rank 71 loading 8 zero partition checkpoints for rank 207 successfully loaded 8 ZeRO state_dicts for rank 138 successfully loaded 8 ZeRO state_dicts for rank 33 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 191 successfully loaded 8 ZeRO state_dicts for rank 147 successfully loaded 8 ZeRO state_dicts for rank 167 successfully loaded 8 ZeRO state_dicts for rank 170 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 142 successfully loaded 8 ZeRO state_dicts for rank 151 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 118 successfully loaded 8 ZeRO state_dicts for rank 97 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 174 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 211 successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 171 successfully loaded 8 ZeRO state_dicts for rank 55 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 45 successfully loaded 8 ZeRO state_dicts for rank 210 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 102 successfully loaded 8 ZeRO state_dicts for rank 109 successfully loaded 8 ZeRO state_dicts for rank 56 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 169 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 10 successfully loaded 8 ZeRO state_dicts for rank 110 loading 8 zero partition checkpoints for rank 195 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 119 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 88 successfully loaded 8 ZeRO state_dicts for rank 92 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 162 loading 8 zero partition checkpoints for rank 192 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 121 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 51 successfully loaded 8 ZeRO state_dicts for rank 78 successfully loaded 8 ZeRO state_dicts for rank 213 successfully loaded 8 ZeRO state_dicts for rank 181 successfully loaded 8 ZeRO state_dicts for rank 194 successfully loaded 8 ZeRO state_dicts for rank 218 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 22 successfully loaded 8 ZeRO state_dicts for rank 188 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 85 loading 8 zero partition checkpoints for rank 154 successfully loaded 8 ZeRO state_dicts for rank 66 successfully loaded 8 ZeRO state_dicts for rank 117 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 101 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 30 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 183 loading 8 zero partition checkpoints for rank 112 successfully loaded 8 ZeRO state_dicts for rank 94 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 160 successfully loaded 8 ZeRO state_dicts for rank 252 loading 8 zero partition checkpoints for rank 205 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 14 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 104 loading 8 zero partition checkpoints for rank 193 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 232 successfully loaded 8 ZeRO state_dicts for rank 177 loading 8 zero partition checkpoints for rank 120 successfully loaded 8 ZeRO state_dicts for rank 228 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 235 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 236 successfully loaded 8 ZeRO state_dicts for rank 31 loading 8 zero partition checkpoints for rank 116 successfully loaded 8 ZeRO state_dicts for rank 224 loading 8 zero partition checkpoints for rank 62 successfully loaded 8 ZeRO state_dicts for rank 74 loading 8 zero partition checkpoints for rank 166 loading 8 zero partition checkpoints for rank 134 successfully loaded 8 ZeRO state_dicts for rank 26 successfully loaded 8 ZeRO state_dicts for rank 176 loading 8 zero partition checkpoints for rank 204 successfully loaded 8 ZeRO state_dicts for rank 251 successfully loaded 8 ZeRO state_dicts for rank 15 successfully loaded 8 ZeRO state_dicts for rank 4 loading 8 zero partition checkpoints for rank 199 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 67 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 179 loading 8 zero partition checkpoints for rank 124 successfully loaded 8 ZeRO state_dicts for rank 247 successfully loaded 8 ZeRO state_dicts for rank 11 successfully loaded 8 ZeRO state_dicts for rank 28 loading 8 zero partition checkpoints for rank 148 successfully loaded 8 ZeRO state_dicts for rank 229 loading 8 zero partition checkpoints for rank 65 successfully loaded 8 ZeRO state_dicts for rank 7 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 221 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 130 successfully loaded 8 ZeRO state_dicts for rank 238 successfully loaded 8 ZeRO state_dicts for rank 12 loading 8 zero partition checkpoints for rank 145 successfully loaded 8 ZeRO state_dicts for rank 234 successfully loaded 8 ZeRO state_dicts for rank 6 loading 8 zero partition checkpoints for rank 206 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 250 loading 8 zero partition checkpoints for rank 157 successfully loaded 8 ZeRO state_dicts for rank 225 successfully loaded 8 ZeRO state_dicts for rank 23 loading 8 zero partition checkpoints for rank 40 successfully loaded 8 ZeRO state_dicts for rank 19 successfully loaded 8 ZeRO state_dicts for rank 3 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 141 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 75 successfully loaded 8 ZeRO state_dicts for rank 239 successfully loaded 8 ZeRO state_dicts for rank 241 successfully loaded 8 ZeRO state_dicts for rank 245 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 0 successfully loaded 8 ZeRO state_dicts for rank 20 successfully loaded 8 ZeRO state_dicts for rank 24 loading 8 zero partition checkpoints for rank 140 successfully loaded 8 ZeRO state_dicts for rank 231 successfully loaded 8 ZeRO state_dicts for rank 29 loading 8 zero partition checkpoints for rank 32 successfully loaded 8 ZeRO state_dicts for rank 240 successfully loaded 8 ZeRO state_dicts for rank 2 successfully loaded 8 ZeRO state_dicts for rank 16 loading 8 zero partition checkpoints for rank 132 successfully loaded 8 ZeRO state_dicts for rank 233 successfully loaded 8 ZeRO state_dicts for rank 253 successfully loaded 8 ZeRO state_dicts for rank 255 successfully loaded 8 ZeRO state_dicts for rank 242 successfully loaded 8 ZeRO state_dicts for rank 237 loading 8 zero partition checkpoints for rank 83 successfully loaded 8 ZeRO state_dicts for rank 254 loading 8 zero partition checkpoints for rank 165 loading 8 zero partition checkpoints for rank 158 successfully loaded 8 ZeRO state_dicts for rank 246 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 99 loading 8 zero partition checkpoints for rank 152 loading 8 zero partition checkpoints for rank 216 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 115 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 100 loading 8 zero partition checkpoints for rank 150 successfully loaded 8 ZeRO state_dicts for rank 13 successfully loaded 8 ZeRO state_dicts for rank 226 successfully loaded 8 ZeRO state_dicts for rank 9 loading 8 zero partition checkpoints for rank 153 loading 8 zero partition checkpoints for rank 64 successfully loaded 8 ZeRO state_dicts for rank 5 successfully loaded 8 ZeRO state_dicts for rank 249 loading 8 zero partition checkpoints for rank 155 loading 8 zero partition checkpoints for rank 72 successfully loaded 8 ZeRO state_dicts for rank 17 successfully loaded 8 ZeRO state_dicts for rank 230 loading 8 zero partition checkpoints for rank 80 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 76 successfully loaded 8 ZeRO state_dicts for rank 1 successfully loaded 8 ZeRO state_dicts for rank 227 loading 8 zero partition checkpoints for rank 144 successfully loaded 8 ZeRO state_dicts for rank 21 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 107 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 87 loading 8 zero partition checkpoints for rank 212 loading 8 zero partition checkpoints for rank 220 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 73 loading 8 zero partition checkpoints for rank 33 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 111 loading 8 zero partition checkpoints for rank 106 loading 8 zero partition checkpoints for rank 167 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 201 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 114 loading 8 zero partition checkpoints for rank 159 loading 8 zero partition checkpoints for rank 57 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 97 loading 8 zero partition checkpoints for rank 219 loading 8 zero partition checkpoints for rank 113 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 61 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 211 loading 8 zero partition checkpoints for rank 50 loading 8 zero partition checkpoints for rank 48 loading 8 zero partition checkpoints for rank 200 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 169 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 81 loading 8 zero partition checkpoints for rank 56 loading 8 zero partition checkpoints for rank 147 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 136 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 178 loading 8 zero partition checkpoints for rank 105 loading 8 zero partition checkpoints for rank 223 loading 8 zero partition checkpoints for rank 197 loading 8 zero partition checkpoints for rank 170 loading 8 zero partition checkpoints for rank 135 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 180 loading 8 zero partition checkpoints for rank 173 loading 8 zero partition checkpoints for rank 123 loading 8 zero partition checkpoints for rank 125 loading 8 zero partition checkpoints for rank 171 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 109 loading 8 zero partition checkpoints for rank 52 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 58 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 218 loading 8 zero partition checkpoints for rank 168 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 194 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 184 successfully loaded 8 ZeRO state_dicts for rank 25 loading 8 zero partition checkpoints for rank 156 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 63 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 93 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 183 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 10 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 69 loading 8 zero partition checkpoints for rank 60 loading 8 zero partition checkpoints for rank 101 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 108 loading 8 zero partition checkpoints for rank 177 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 138 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 143 loading 8 zero partition checkpoints for rank 142 loading 8 zero partition checkpoints for rank 172 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 68 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 252 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 232 loading 8 zero partition checkpoints for rank 137 loading 8 zero partition checkpoints for rank 214 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 127 loading 8 zero partition checkpoints for rank 139 loading 8 zero partition checkpoints for rank 110 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 229 loading 8 zero partition checkpoints for rank 128 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 174 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 215 loading 8 zero partition checkpoints for rank 160 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 6 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 243 loading 8 zero partition checkpoints for rank 221 loading 8 zero partition checkpoints for rank 8 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 240 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 176 loading 8 zero partition checkpoints for rank 175 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 209 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 239 loading 8 zero partition checkpoints for rank 88 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 11 loading 8 zero partition checkpoints for rank 246 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 248 loading 8 zero partition checkpoints for rank 251 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 235 loading 8 zero partition checkpoints for rank 250 loading 8 zero partition checkpoints for rank 23 loading 8 zero partition checkpoints for rank 244 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 18 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 242 loading 8 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 249 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 30 loading 8 zero partition checkpoints for rank 19 loading 8 zero partition checkpoints for rank 15 loading 8 zero partition checkpoints for rank 7 loading 8 zero partition checkpoints for rank 238 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 255 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 16 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 5 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 226 loading 8 zero partition checkpoints for rank 227 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 6210 time (ms) | load-checkpoint: 56578.08 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-27 17:45:07 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.174718 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > WARNING: could not find index map files, building the indices on rank 0 ... > last epoch number of samples (36925554) is smaller than 80% of number of samples per epoch (131537223), setting separate_last_epoch to True WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-27 17:46:37 CEST)" was missed by 0:00:21.460713 > elasped time to build and save doc-idx mapping (seconds): 74.353737 using: number of documents: 288714672 number of epochs: 3 sequence length: 2048 total number of samples: 394611669 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-27 17:47:37 CEST)" was missed by 0:00:11.662010 > elasped time to build and save sample-idx mapping (seconds): 24.775998 > building shuffle index with split [0, 263074446) and [263074446, 394611669) ... > elasped time to build and save shuffle-idx mapping (seconds): 26.026031 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.089 seconds total number of samples: 394611670 total number of epochs: 3 > WARNING: could not find index map files, building the indices on rank 0 ... > only one epoch required, setting separate_last_epoch to False > elasped time to build and save doc-idx mapping (seconds): 0.979826 using: number of documents: 15211521 number of epochs: 1 sequence length: 2048 total number of samples: 6927160 > elasped time to build and save sample-idx mapping (seconds): 0.364344 > building shuffle index with split [0, 6927160) and [6927160, 6927160) ... > elasped time to build and save shuffle-idx mapping (seconds): 0.312714 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.034 seconds total number of samples: 6927161 total number of epochs: 1 > WARNING: could not find index map files, building the indices on rank 0 ... > only one epoch required, setting separate_last_epoch to False > elasped time to build and save doc-idx mapping (seconds): 0.019056 using: number of documents: 304230 number of epochs: 1 sequence length: 2048 total number of samples: 137383 > elasped time to build and save sample-idx mapping (seconds): 0.007505 > building shuffle index with split [0, 137383) and [137383, 137383) ... > elasped time to build and save shuffle-idx mapping (seconds): 0.021865 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.110 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-27 17:47:20 done with setup ... training ... time (ms) | model-and-optimizer-setup: 64587.82 | train/valid/test-data-iterators-setup: 131511.20 [before the start of training step] datetime: 2021-09-27 17:47:20 [2021-09-27 17:47:20,277] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-27 17:47:20,277] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-27 17:47:20,277] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-27 17:47:20,277] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-27 17:47:20,277] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 225] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68798828125 | reserved: 22492.0 | max reserved: 22492.0 [Rank 226] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.6884765625 | reserved: 22492.0 | max reserved: 22492.0 [Rank 1] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23278.0 | max reserved: 23278.0 [Rank 2] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23278.0 | max reserved: 23278.0 [Rank 0] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23246.0 | max reserved: 23246.0 [Rank 224] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68994140625 | reserved: 22492.0 | max reserved: 22492.0 [Rank 227] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.6884765625 | reserved: 21700.0 | max reserved: 21700.0 iteration 6220/ 159576 | consumed samples: 194400 | elapsed time per iteration (ms): 19180.4 | learning rate: 5.378E-05 | global batch size: 80 | lm loss: 6.355129E+00 | loss scale: 4096.0 | grad norm: 93535.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [Rank 3] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23278.0 | max reserved: 23278.0 [Rank 33] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20130.0 | max reserved: 20130.0 [Rank 66] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19950.0 | max reserved: 19950.0 [Rank 34] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20250.0 | max reserved: 20250.0 [Rank 98] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19886.0 | max reserved: 19886.0 [Rank 130] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19338.0 | max reserved: 19338.0 [Rank 97] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19402.0 | max reserved: 19402.0 [Rank 161] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 20170.0 | max reserved: 20170.0 [Rank 129] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19050.0 | max reserved: 19050.0 [Rank 193] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0 [Rank 65] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19582.0 | max reserved: 19582.0 [Rank 194] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18970.0 | max reserved: 18970.0 [Rank 162] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19146.0 | max reserved: 19146.0 [Rank 32] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20676.0 | max reserved: 20676.0 [Rank 96] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 20296.0 | max reserved: 20296.0 [Rank 64] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20392.0 | max reserved: 20392.0 [Rank 35] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20030.0 | max reserved: 20030.0 [Rank 160] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19636.0 | max reserved: 19636.0 [Rank 192] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19012.0 | max reserved: 19012.0 [Rank 128] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 20008.0 | max reserved: 20008.0 [Rank 99] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19870.0 | max reserved: 19870.0 [Rank 67] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19582.0 | max reserved: 19582.0 [Rank 131] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19278.0 | max reserved: 19278.0 [Rank 195] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18970.0 | max reserved: 18970.0 [Rank 163] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 18826.0 | max reserved: 18826.0 iteration 6230/ 159576 | consumed samples: 195200 | elapsed time per iteration (ms): 17628.9 | learning rate: 5.400E-05 | global batch size: 80 | lm loss: 6.325471E+00 | loss scale: 4096.0 | grad norm: 104626.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6240/ 159576 | consumed samples: 196000 | elapsed time per iteration (ms): 17585.3 | learning rate: 5.423E-05 | global batch size: 80 | lm loss: 6.313773E+00 | loss scale: 4096.0 | grad norm: 104488.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6250/ 159576 | consumed samples: 196800 | elapsed time per iteration (ms): 17683.9 | learning rate: 5.445E-05 | global batch size: 80 | lm loss: 6.302388E+00 | loss scale: 4096.0 | grad norm: 99404.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6260/ 159576 | consumed samples: 197600 | elapsed time per iteration (ms): 17834.3 | learning rate: 5.467E-05 | global batch size: 80 | lm loss: 6.322264E+00 | loss scale: 4096.0 | grad norm: 134601.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6270/ 159576 | consumed samples: 198400 | elapsed time per iteration (ms): 17647.5 | learning rate: 5.489E-05 | global batch size: 80 | lm loss: 6.319476E+00 | loss scale: 4096.0 | grad norm: 142879.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6280/ 159576 | consumed samples: 199200 | elapsed time per iteration (ms): 17607.4 | learning rate: 5.511E-05 | global batch size: 80 | lm loss: 6.321982E+00 | loss scale: 4096.0 | grad norm: 114136.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6290/ 159576 | consumed samples: 200000 | elapsed time per iteration (ms): 17636.6 | learning rate: 5.534E-05 | global batch size: 80 | lm loss: 6.272703E+00 | loss scale: 4096.0 | grad norm: 101011.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6300/ 159576 | consumed samples: 200800 | elapsed time per iteration (ms): 17537.9 | learning rate: 5.556E-05 | global batch size: 80 | lm loss: 6.295881E+00 | loss scale: 4096.0 | grad norm: 116874.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6310/ 159576 | consumed samples: 201600 | elapsed time per iteration (ms): 17634.4 | learning rate: 5.578E-05 | global batch size: 80 | lm loss: 6.324175E+00 | loss scale: 4096.0 | grad norm: 115938.037 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6320/ 159576 | consumed samples: 202400 | elapsed time per iteration (ms): 17796.6 | learning rate: 5.600E-05 | global batch size: 80 | lm loss: 6.301260E+00 | loss scale: 4096.0 | grad norm: 128639.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6330/ 159576 | consumed samples: 203200 | elapsed time per iteration (ms): 17684.4 | learning rate: 5.622E-05 | global batch size: 80 | lm loss: 6.325212E+00 | loss scale: 4096.0 | grad norm: 122331.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6340/ 159576 | consumed samples: 204000 | elapsed time per iteration (ms): 17751.1 | learning rate: 5.645E-05 | global batch size: 80 | lm loss: 6.315152E+00 | loss scale: 4096.0 | grad norm: 107257.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 18:28:25] PULSE: tr8-104B is running for 44:59 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 6350/ 159576 | consumed samples: 204800 | elapsed time per iteration (ms): 17472.1 | learning rate: 5.667E-05 | global batch size: 80 | lm loss: 6.305837E+00 | loss scale: 4096.0 | grad norm: 92922.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6360/ 159576 | consumed samples: 205600 | elapsed time per iteration (ms): 17585.4 | learning rate: 5.689E-05 | global batch size: 80 | lm loss: 6.291708E+00 | loss scale: 4096.0 | grad norm: 128015.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6370/ 159576 | consumed samples: 206400 | elapsed time per iteration (ms): 17756.4 | learning rate: 5.711E-05 | global batch size: 80 | lm loss: 6.336868E+00 | loss scale: 4096.0 | grad norm: 132675.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6380/ 159576 | consumed samples: 207200 | elapsed time per iteration (ms): 17470.3 | learning rate: 5.733E-05 | global batch size: 80 | lm loss: 6.319473E+00 | loss scale: 4096.0 | grad norm: 121903.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6390/ 159576 | consumed samples: 208000 | elapsed time per iteration (ms): 17849.6 | learning rate: 5.755E-05 | global batch size: 80 | lm loss: 6.295473E+00 | loss scale: 4096.0 | grad norm: 108842.830 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6400/ 159576 | consumed samples: 208800 | elapsed time per iteration (ms): 17525.6 | learning rate: 5.778E-05 | global batch size: 80 | lm loss: 6.305953E+00 | loss scale: 4096.0 | grad norm: 110142.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6410/ 159576 | consumed samples: 209600 | elapsed time per iteration (ms): 17695.6 | learning rate: 5.800E-05 | global batch size: 80 | lm loss: 6.327058E+00 | loss scale: 4096.0 | grad norm: 149204.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6420/ 159576 | consumed samples: 210400 | elapsed time per iteration (ms): 17590.8 | learning rate: 5.822E-05 | global batch size: 80 | lm loss: 6.301820E+00 | loss scale: 4096.0 | grad norm: 90947.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6430/ 159576 | consumed samples: 211200 | elapsed time per iteration (ms): 17793.7 | learning rate: 5.844E-05 | global batch size: 80 | lm loss: 6.343626E+00 | loss scale: 4096.0 | grad norm: 345234.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6440/ 159576 | consumed samples: 212000 | elapsed time per iteration (ms): 17631.2 | learning rate: 5.866E-05 | global batch size: 80 | lm loss: 6.323440E+00 | loss scale: 4096.0 | grad norm: 96087.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6450/ 159576 | consumed samples: 212800 | elapsed time per iteration (ms): 17688.1 | learning rate: 5.889E-05 | global batch size: 80 | lm loss: 6.310754E+00 | loss scale: 4096.0 | grad norm: 142702.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6460/ 159576 | consumed samples: 213600 | elapsed time per iteration (ms): 17884.9 | learning rate: 5.911E-05 | global batch size: 80 | lm loss: 6.326996E+00 | loss scale: 4096.0 | grad norm: 139353.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6470/ 159576 | consumed samples: 214400 | elapsed time per iteration (ms): 17777.5 | learning rate: 5.933E-05 | global batch size: 80 | lm loss: 6.303541E+00 | loss scale: 4096.0 | grad norm: 163735.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6480/ 159576 | consumed samples: 215200 | elapsed time per iteration (ms): 17758.4 | learning rate: 5.955E-05 | global batch size: 80 | lm loss: 6.318764E+00 | loss scale: 4096.0 | grad norm: 122570.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6490/ 159576 | consumed samples: 216000 | elapsed time per iteration (ms): 17864.1 | learning rate: 5.977E-05 | global batch size: 80 | lm loss: 6.307048E+00 | loss scale: 4096.0 | grad norm: 116946.724 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6500/ 159576 | consumed samples: 216800 | elapsed time per iteration (ms): 17901.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.315722E+00 | loss scale: 4096.0 | grad norm: 93922.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6510/ 159576 | consumed samples: 217600 | elapsed time per iteration (ms): 17582.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.323491E+00 | loss scale: 4096.0 | grad norm: 148357.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6520/ 159576 | consumed samples: 218400 | elapsed time per iteration (ms): 17725.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.330975E+00 | loss scale: 4096.0 | grad norm: 103909.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6530/ 159576 | consumed samples: 219200 | elapsed time per iteration (ms): 17788.4 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.330465E+00 | loss scale: 4096.0 | grad norm: 112690.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6540/ 159576 | consumed samples: 220000 | elapsed time per iteration (ms): 17722.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.325342E+00 | loss scale: 4096.0 | grad norm: 74738.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6550/ 159576 | consumed samples: 220800 | elapsed time per iteration (ms): 17778.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.338161E+00 | loss scale: 4096.0 | grad norm: 92386.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 19:28:18] PULSE: tr8-104B is running for 1:44:52 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 6560/ 159576 | consumed samples: 221600 | elapsed time per iteration (ms): 17633.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.346842E+00 | loss scale: 4096.0 | grad norm: 91412.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6570/ 159576 | consumed samples: 222400 | elapsed time per iteration (ms): 17585.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.354182E+00 | loss scale: 4096.0 | grad norm: 106016.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6580/ 159576 | consumed samples: 223200 | elapsed time per iteration (ms): 17723.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.339022E+00 | loss scale: 4096.0 | grad norm: 99292.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6590/ 159576 | consumed samples: 224000 | elapsed time per iteration (ms): 17636.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.343359E+00 | loss scale: 4096.0 | grad norm: 142334.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6600/ 159576 | consumed samples: 224800 | elapsed time per iteration (ms): 17663.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.340461E+00 | loss scale: 4096.0 | grad norm: 152141.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6610/ 159576 | consumed samples: 225600 | elapsed time per iteration (ms): 17548.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.323914E+00 | loss scale: 4096.0 | grad norm: 170495.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6620/ 159576 | consumed samples: 226400 | elapsed time per iteration (ms): 17566.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.304215E+00 | loss scale: 4096.0 | grad norm: 160242.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6630/ 159576 | consumed samples: 227200 | elapsed time per iteration (ms): 17951.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.312865E+00 | loss scale: 4096.0 | grad norm: 104923.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6640/ 159576 | consumed samples: 228000 | elapsed time per iteration (ms): 17693.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.337115E+00 | loss scale: 4096.0 | grad norm: 162544.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6650/ 159576 | consumed samples: 228800 | elapsed time per iteration (ms): 17707.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.327879E+00 | loss scale: 4096.0 | grad norm: 80497.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6660/ 159576 | consumed samples: 229600 | elapsed time per iteration (ms): 17584.5 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.404206E+00 | loss scale: 4096.0 | grad norm: 136886.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6670/ 159576 | consumed samples: 230400 | elapsed time per iteration (ms): 17615.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.359778E+00 | loss scale: 4096.0 | grad norm: 123501.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6680/ 159576 | consumed samples: 231200 | elapsed time per iteration (ms): 17812.0 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.318851E+00 | loss scale: 4096.0 | grad norm: 118146.851 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6690/ 159576 | consumed samples: 232000 | elapsed time per iteration (ms): 17690.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.324978E+00 | loss scale: 4096.0 | grad norm: 127513.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6700/ 159576 | consumed samples: 232800 | elapsed time per iteration (ms): 17679.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.312429E+00 | loss scale: 4096.0 | grad norm: 141251.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6710/ 159576 | consumed samples: 233600 | elapsed time per iteration (ms): 17730.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.304575E+00 | loss scale: 8192.0 | grad norm: 354806.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6720/ 159576 | consumed samples: 234400 | elapsed time per iteration (ms): 17817.5 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.343853E+00 | loss scale: 8192.0 | grad norm: 400003.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6730/ 159576 | consumed samples: 235200 | elapsed time per iteration (ms): 17886.0 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.329220E+00 | loss scale: 8192.0 | grad norm: 354798.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6740/ 159576 | consumed samples: 236000 | elapsed time per iteration (ms): 17869.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.341031E+00 | loss scale: 8192.0 | grad norm: 452433.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6750/ 159576 | consumed samples: 236912 | elapsed time per iteration (ms): 18328.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.325079E+00 | loss scale: 8192.0 | grad norm: 272354.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6760/ 159576 | consumed samples: 237872 | elapsed time per iteration (ms): 17158.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.350076E+00 | loss scale: 4096.0 | grad norm: 109464.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 20:32:07] PULSE: tr8-104B is running for 2:48:41 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 6770/ 159576 | consumed samples: 238832 | elapsed time per iteration (ms): 18779.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.347258E+00 | loss scale: 4096.0 | grad norm: 151362.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6780/ 159576 | consumed samples: 239792 | elapsed time per iteration (ms): 18764.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.483617E+00 | loss scale: 4096.0 | grad norm: 144409.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6790/ 159576 | consumed samples: 240752 | elapsed time per iteration (ms): 18830.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.459402E+00 | loss scale: 4096.0 | grad norm: 106762.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6800/ 159576 | consumed samples: 241712 | elapsed time per iteration (ms): 18594.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.457979E+00 | loss scale: 4096.0 | grad norm: 159826.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6810/ 159576 | consumed samples: 242672 | elapsed time per iteration (ms): 18590.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.445743E+00 | loss scale: 4096.0 | grad norm: 104586.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6820/ 159576 | consumed samples: 243632 | elapsed time per iteration (ms): 18726.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.371418E+00 | loss scale: 4096.0 | grad norm: 181059.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6830/ 159576 | consumed samples: 244592 | elapsed time per iteration (ms): 18734.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.385859E+00 | loss scale: 4096.0 | grad norm: 126958.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6840/ 159576 | consumed samples: 245552 | elapsed time per iteration (ms): 18634.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.351850E+00 | loss scale: 4096.0 | grad norm: 154126.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6850/ 159576 | consumed samples: 246512 | elapsed time per iteration (ms): 18587.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.341198E+00 | loss scale: 4096.0 | grad norm: 133262.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6860/ 159576 | consumed samples: 247472 | elapsed time per iteration (ms): 19013.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.317137E+00 | loss scale: 4096.0 | grad norm: 101860.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6870/ 159576 | consumed samples: 248432 | elapsed time per iteration (ms): 18789.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.332655E+00 | loss scale: 4096.0 | grad norm: 467416.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6880/ 159576 | consumed samples: 249392 | elapsed time per iteration (ms): 18654.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.385090E+00 | loss scale: 4096.0 | grad norm: 154062.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6890/ 159576 | consumed samples: 250352 | elapsed time per iteration (ms): 18644.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.355402E+00 | loss scale: 4096.0 | grad norm: 154349.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6900/ 159576 | consumed samples: 251312 | elapsed time per iteration (ms): 18495.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.365808E+00 | loss scale: 4096.0 | grad norm: 95313.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6910/ 159576 | consumed samples: 252272 | elapsed time per iteration (ms): 18802.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.598378E+00 | loss scale: 4096.0 | grad norm: 84678.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6920/ 159576 | consumed samples: 253232 | elapsed time per iteration (ms): 18641.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.314456E+00 | loss scale: 4096.0 | grad norm: 122716.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6930/ 159576 | consumed samples: 254192 | elapsed time per iteration (ms): 18564.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 9.121927E+00 | loss scale: 4096.0 | grad norm: 283384.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6940/ 159576 | consumed samples: 255152 | elapsed time per iteration (ms): 18549.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 1.023865E+01 | loss scale: 4096.0 | grad norm: 42359.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6950/ 159576 | consumed samples: 256112 | elapsed time per iteration (ms): 17675.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 9.249577E+00 | loss scale: 2048.0 | grad norm: 78368.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6960/ 159576 | consumed samples: 257072 | elapsed time per iteration (ms): 18443.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 8.389180E+00 | loss scale: 2048.0 | grad norm: 40490.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6970/ 159576 | consumed samples: 258032 | elapsed time per iteration (ms): 18545.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.529938E+00 | loss scale: 2048.0 | grad norm: 14218.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 21:35:01] PULSE: tr8-104B is running for 3:51:35 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 6980/ 159576 | consumed samples: 258992 | elapsed time per iteration (ms): 18379.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.102215E+00 | loss scale: 2048.0 | grad norm: 18580.148 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6990/ 159576 | consumed samples: 259952 | elapsed time per iteration (ms): 18355.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.018941E+00 | loss scale: 2048.0 | grad norm: 17882.180 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7000/ 159576 | consumed samples: 260912 | elapsed time per iteration (ms): 18505.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.942125E+00 | loss scale: 2048.0 | grad norm: 26860.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 7000 | lm loss value: 6.872679E+00 | lm loss PPL: 9.655315E+02 | ------------------------------------------------------------------------------------------------ iteration 7010/ 159576 | consumed samples: 261872 | elapsed time per iteration (ms): 19970.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.816376E+00 | loss scale: 2048.0 | grad norm: 40294.075 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7020/ 159576 | consumed samples: 262832 | elapsed time per iteration (ms): 18648.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.821559E+00 | loss scale: 2048.0 | grad norm: 25012.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7030/ 159576 | consumed samples: 263792 | elapsed time per iteration (ms): 18478.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.893867E+00 | loss scale: 2048.0 | grad norm: 39565.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7040/ 159576 | consumed samples: 264752 | elapsed time per iteration (ms): 18670.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.871474E+00 | loss scale: 2048.0 | grad norm: 22832.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7050/ 159576 | consumed samples: 265712 | elapsed time per iteration (ms): 18521.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.875928E+00 | loss scale: 2048.0 | grad norm: 26237.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7060/ 159576 | consumed samples: 266672 | elapsed time per iteration (ms): 18543.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.827568E+00 | loss scale: 2048.0 | grad norm: 31639.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7070/ 159576 | consumed samples: 267632 | elapsed time per iteration (ms): 18564.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.711889E+00 | loss scale: 2048.0 | grad norm: 46310.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7080/ 159576 | consumed samples: 268592 | elapsed time per iteration (ms): 18629.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.683693E+00 | loss scale: 2048.0 | grad norm: 31484.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7090/ 159576 | consumed samples: 269552 | elapsed time per iteration (ms): 18473.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.627121E+00 | loss scale: 2048.0 | grad norm: 45017.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7100/ 159576 | consumed samples: 270512 | elapsed time per iteration (ms): 18806.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.627071E+00 | loss scale: 2048.0 | grad norm: 57880.707 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7110/ 159576 | consumed samples: 271472 | elapsed time per iteration (ms): 18537.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.608931E+00 | loss scale: 2048.0 | grad norm: 67724.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7120/ 159576 | consumed samples: 272432 | elapsed time per iteration (ms): 18556.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.592625E+00 | loss scale: 2048.0 | grad norm: 67655.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7130/ 159576 | consumed samples: 273392 | elapsed time per iteration (ms): 18620.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.769730E+00 | loss scale: 2048.0 | grad norm: 50594.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7140/ 159576 | consumed samples: 274352 | elapsed time per iteration (ms): 18517.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.749163E+00 | loss scale: 2048.0 | grad norm: 30940.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7150/ 159576 | consumed samples: 275312 | elapsed time per iteration (ms): 18726.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.695554E+00 | loss scale: 2048.0 | grad norm: 49756.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 22:31:42] PULSE: tr8-104B is running for 4:48:16 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 7160/ 159576 | consumed samples: 276272 | elapsed time per iteration (ms): 18567.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.630823E+00 | loss scale: 2048.0 | grad norm: 46573.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7170/ 159576 | consumed samples: 277232 | elapsed time per iteration (ms): 18787.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.637067E+00 | loss scale: 2048.0 | grad norm: 47650.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7180/ 159576 | consumed samples: 278192 | elapsed time per iteration (ms): 18669.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.663966E+00 | loss scale: 2048.0 | grad norm: 54677.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7190/ 159576 | consumed samples: 279152 | elapsed time per iteration (ms): 18711.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.603532E+00 | loss scale: 2048.0 | grad norm: 75914.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7200/ 159576 | consumed samples: 280112 | elapsed time per iteration (ms): 18682.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.571133E+00 | loss scale: 2048.0 | grad norm: 74379.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7210/ 159576 | consumed samples: 281072 | elapsed time per iteration (ms): 18622.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.584048E+00 | loss scale: 2048.0 | grad norm: 75888.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7220/ 159576 | consumed samples: 282032 | elapsed time per iteration (ms): 18555.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.554535E+00 | loss scale: 2048.0 | grad norm: 90934.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7230/ 159576 | consumed samples: 282992 | elapsed time per iteration (ms): 18600.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.558411E+00 | loss scale: 2048.0 | grad norm: 54832.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7240/ 159576 | consumed samples: 284032 | elapsed time per iteration (ms): 19119.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.585645E+00 | loss scale: 2048.0 | grad norm: 116769.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7250/ 159576 | consumed samples: 285152 | elapsed time per iteration (ms): 19421.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.554094E+00 | loss scale: 2048.0 | grad norm: 79780.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7260/ 159576 | consumed samples: 286272 | elapsed time per iteration (ms): 19643.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.545351E+00 | loss scale: 2048.0 | grad norm: 153165.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7270/ 159576 | consumed samples: 287392 | elapsed time per iteration (ms): 19873.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.548807E+00 | loss scale: 2048.0 | grad norm: 96725.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7280/ 159576 | consumed samples: 288512 | elapsed time per iteration (ms): 19830.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.532312E+00 | loss scale: 2048.0 | grad norm: 85054.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7290/ 159576 | consumed samples: 289632 | elapsed time per iteration (ms): 19469.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.535855E+00 | loss scale: 2048.0 | grad norm: 66255.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7300/ 159576 | consumed samples: 290752 | elapsed time per iteration (ms): 19578.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.583752E+00 | loss scale: 2048.0 | grad norm: 61901.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7310/ 159576 | consumed samples: 291872 | elapsed time per iteration (ms): 19646.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.539584E+00 | loss scale: 2048.0 | grad norm: 68238.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7320/ 159576 | consumed samples: 292992 | elapsed time per iteration (ms): 19642.5 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.526649E+00 | loss scale: 2048.0 | grad norm: 69527.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7330/ 159576 | consumed samples: 294112 | elapsed time per iteration (ms): 19508.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.514026E+00 | loss scale: 2048.0 | grad norm: 63745.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7340/ 159576 | consumed samples: 295232 | elapsed time per iteration (ms): 19676.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.519949E+00 | loss scale: 2048.0 | grad norm: 96730.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 23:32:04] PULSE: tr8-104B is running for 5:48:38 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 7350/ 159576 | consumed samples: 296352 | elapsed time per iteration (ms): 19539.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.510521E+00 | loss scale: 2048.0 | grad norm: 95201.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7360/ 159576 | consumed samples: 297472 | elapsed time per iteration (ms): 19834.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.532115E+00 | loss scale: 2048.0 | grad norm: 269153.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7370/ 159576 | consumed samples: 298592 | elapsed time per iteration (ms): 19564.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.501956E+00 | loss scale: 2048.0 | grad norm: 89998.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7380/ 159576 | consumed samples: 299712 | elapsed time per iteration (ms): 19672.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.522272E+00 | loss scale: 2048.0 | grad norm: 75724.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7390/ 159576 | consumed samples: 300832 | elapsed time per iteration (ms): 19562.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.511443E+00 | loss scale: 2048.0 | grad norm: 89537.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7400/ 159576 | consumed samples: 301952 | elapsed time per iteration (ms): 19728.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.534271E+00 | loss scale: 2048.0 | grad norm: 79036.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7410/ 159576 | consumed samples: 303072 | elapsed time per iteration (ms): 19731.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.550716E+00 | loss scale: 2048.0 | grad norm: 60002.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7420/ 159576 | consumed samples: 304192 | elapsed time per iteration (ms): 19733.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.546501E+00 | loss scale: 2048.0 | grad norm: 69147.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7430/ 159576 | consumed samples: 305312 | elapsed time per iteration (ms): 19483.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.560014E+00 | loss scale: 2048.0 | grad norm: 75450.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7440/ 159576 | consumed samples: 306432 | elapsed time per iteration (ms): 19613.5 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.523249E+00 | loss scale: 2048.0 | grad norm: 104393.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7450/ 159576 | consumed samples: 307552 | elapsed time per iteration (ms): 19763.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.510474E+00 | loss scale: 4096.0 | grad norm: 189305.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7460/ 159576 | consumed samples: 308672 | elapsed time per iteration (ms): 19871.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.501906E+00 | loss scale: 4096.0 | grad norm: 277069.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7470/ 159576 | consumed samples: 309792 | elapsed time per iteration (ms): 18903.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.497433E+00 | loss scale: 4096.0 | grad norm: 225644.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7480/ 159576 | consumed samples: 310912 | elapsed time per iteration (ms): 19707.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.488033E+00 | loss scale: 4096.0 | grad norm: 230163.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7490/ 159576 | consumed samples: 312032 | elapsed time per iteration (ms): 19720.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.505843E+00 | loss scale: 4096.0 | grad norm: 238654.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7500/ 159576 | consumed samples: 313152 | elapsed time per iteration (ms): 18950.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.477815E+00 | loss scale: 2048.0 | grad norm: 106401.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-28 00:24:01,519] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step7500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17115.61 iteration 7510/ 159576 | consumed samples: 314272 | elapsed time per iteration (ms): 21118.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.494813E+00 | loss scale: 2048.0 | grad norm: 111065.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7520/ 159576 | consumed samples: 315392 | elapsed time per iteration (ms): 19805.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.508061E+00 | loss scale: 2048.0 | grad norm: 108163.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 00:32:54] PULSE: tr8-104B is running for 6:49:28 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 7530/ 159576 | consumed samples: 316512 | elapsed time per iteration (ms): 19675.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.531902E+00 | loss scale: 2048.0 | grad norm: 113133.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7540/ 159576 | consumed samples: 317632 | elapsed time per iteration (ms): 19542.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.512622E+00 | loss scale: 2048.0 | grad norm: 124840.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7550/ 159576 | consumed samples: 318752 | elapsed time per iteration (ms): 19516.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.501436E+00 | loss scale: 2048.0 | grad norm: 133229.950 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7560/ 159576 | consumed samples: 319872 | elapsed time per iteration (ms): 19503.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.490542E+00 | loss scale: 2048.0 | grad norm: 71964.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7570/ 159576 | consumed samples: 320992 | elapsed time per iteration (ms): 19421.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.521871E+00 | loss scale: 2048.0 | grad norm: 88801.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7580/ 159576 | consumed samples: 322112 | elapsed time per iteration (ms): 19481.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.505743E+00 | loss scale: 2048.0 | grad norm: 284454.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7590/ 159576 | consumed samples: 323232 | elapsed time per iteration (ms): 19560.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.490807E+00 | loss scale: 2048.0 | grad norm: 110863.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7600/ 159576 | consumed samples: 324352 | elapsed time per iteration (ms): 19566.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.490352E+00 | loss scale: 2048.0 | grad norm: 99394.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7610/ 159576 | consumed samples: 325472 | elapsed time per iteration (ms): 19546.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.487664E+00 | loss scale: 2048.0 | grad norm: 98963.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7620/ 159576 | consumed samples: 326592 | elapsed time per iteration (ms): 19448.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.495935E+00 | loss scale: 2048.0 | grad norm: 80186.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7630/ 159576 | consumed samples: 327712 | elapsed time per iteration (ms): 19586.5 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.485136E+00 | loss scale: 2048.0 | grad norm: 90794.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7640/ 159576 | consumed samples: 328832 | elapsed time per iteration (ms): 19579.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.484132E+00 | loss scale: 2048.0 | grad norm: 120050.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7650/ 159576 | consumed samples: 329952 | elapsed time per iteration (ms): 19625.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.474982E+00 | loss scale: 2048.0 | grad norm: 132690.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7660/ 159576 | consumed samples: 331120 | elapsed time per iteration (ms): 19869.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.502007E+00 | loss scale: 2048.0 | grad norm: 141077.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7670/ 159576 | consumed samples: 332400 | elapsed time per iteration (ms): 20699.4 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.459695E+00 | loss scale: 2048.0 | grad norm: 170892.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7680/ 159576 | consumed samples: 333680 | elapsed time per iteration (ms): 20602.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.471451E+00 | loss scale: 2048.0 | grad norm: 186408.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7690/ 159576 | consumed samples: 334960 | elapsed time per iteration (ms): 20925.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.450164E+00 | loss scale: 2048.0 | grad norm: 126551.055 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7700/ 159576 | consumed samples: 336240 | elapsed time per iteration (ms): 20872.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.483758E+00 | loss scale: 2048.0 | grad norm: 113828.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 01:32:21] PULSE: tr8-104B is running for 7:48:55 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 7710/ 159576 | consumed samples: 337520 | elapsed time per iteration (ms): 20786.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.474139E+00 | loss scale: 2048.0 | grad norm: 92984.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7720/ 159576 | consumed samples: 338800 | elapsed time per iteration (ms): 20911.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.465121E+00 | loss scale: 2048.0 | grad norm: 101949.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7730/ 159576 | consumed samples: 340080 | elapsed time per iteration (ms): 20160.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.493755E+00 | loss scale: 1024.0 | grad norm: 47045.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7740/ 159576 | consumed samples: 341360 | elapsed time per iteration (ms): 20757.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.475374E+00 | loss scale: 1024.0 | grad norm: 62044.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7750/ 159576 | consumed samples: 342640 | elapsed time per iteration (ms): 20801.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.480064E+00 | loss scale: 1024.0 | grad norm: 55223.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7760/ 159576 | consumed samples: 343920 | elapsed time per iteration (ms): 20712.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.477321E+00 | loss scale: 1024.0 | grad norm: 75612.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7770/ 159576 | consumed samples: 345200 | elapsed time per iteration (ms): 20773.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.486430E+00 | loss scale: 1024.0 | grad norm: 57309.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7780/ 159576 | consumed samples: 346480 | elapsed time per iteration (ms): 20686.3 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.465924E+00 | loss scale: 1024.0 | grad norm: 78208.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7790/ 159576 | consumed samples: 347760 | elapsed time per iteration (ms): 20744.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.439983E+00 | loss scale: 1024.0 | grad norm: 85978.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7800/ 159576 | consumed samples: 349040 | elapsed time per iteration (ms): 20858.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.466323E+00 | loss scale: 1024.0 | grad norm: 83254.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7810/ 159576 | consumed samples: 350320 | elapsed time per iteration (ms): 20728.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.452026E+00 | loss scale: 1024.0 | grad norm: 82300.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7820/ 159576 | consumed samples: 351600 | elapsed time per iteration (ms): 20746.4 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.471143E+00 | loss scale: 1024.0 | grad norm: 70196.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7830/ 159576 | consumed samples: 352880 | elapsed time per iteration (ms): 20801.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.484294E+00 | loss scale: 1024.0 | grad norm: 52460.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7840/ 159576 | consumed samples: 354160 | elapsed time per iteration (ms): 20885.5 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.492403E+00 | loss scale: 1024.0 | grad norm: 61833.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7850/ 159576 | consumed samples: 355440 | elapsed time per iteration (ms): 20657.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.466279E+00 | loss scale: 1024.0 | grad norm: 62285.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7860/ 159576 | consumed samples: 356720 | elapsed time per iteration (ms): 19964.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.448762E+00 | loss scale: 512.0 | grad norm: 76192.061 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7870/ 159576 | consumed samples: 358000 | elapsed time per iteration (ms): 20780.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.468709E+00 | loss scale: 512.0 | grad norm: 27166.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7880/ 159576 | consumed samples: 359280 | elapsed time per iteration (ms): 20507.3 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.619281E+00 | loss scale: 512.0 | grad norm: 27451.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 02:32:25] PULSE: tr8-104B is scheduled to start in 17:52:43 (at 2021-09-28T20:25:09) (1277218 on 'gpu_p13' partition) [2021-09-28 02:32:25] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277216 on 'gpu_p13' partition) [2021-09-28 02:32:25] PULSE: tr8-104B is running for 8:48:59 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 7890/ 159576 | consumed samples: 360560 | elapsed time per iteration (ms): 20685.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.639037E+00 | loss scale: 512.0 | grad norm: 21160.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7900/ 159576 | consumed samples: 361840 | elapsed time per iteration (ms): 20486.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.220924E+00 | loss scale: 512.0 | grad norm: 53815.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7910/ 159576 | consumed samples: 363120 | elapsed time per iteration (ms): 20468.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.521174E+00 | loss scale: 512.0 | grad norm: 36754.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7920/ 159576 | consumed samples: 364400 | elapsed time per iteration (ms): 20813.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.992101E+00 | loss scale: 512.0 | grad norm: 30259.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7930/ 159576 | consumed samples: 365680 | elapsed time per iteration (ms): 20655.4 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.722378E+00 | loss scale: 512.0 | grad norm: 10480.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7940/ 159576 | consumed samples: 366960 | elapsed time per iteration (ms): 20950.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.599706E+00 | loss scale: 512.0 | grad norm: 6665.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7950/ 159576 | consumed samples: 368240 | elapsed time per iteration (ms): 20603.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.160472E+00 | loss scale: 512.0 | grad norm: 8111.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7960/ 159576 | consumed samples: 369520 | elapsed time per iteration (ms): 20624.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.089828E+00 | loss scale: 512.0 | grad norm: 6778.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7970/ 159576 | consumed samples: 370800 | elapsed time per iteration (ms): 20691.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.228701E+00 | loss scale: 512.0 | grad norm: 20747.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7980/ 159576 | consumed samples: 372080 | elapsed time per iteration (ms): 20583.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.185581E+00 | loss scale: 512.0 | grad norm: 12240.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7990/ 159576 | consumed samples: 373360 | elapsed time per iteration (ms): 20386.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.279394E+00 | loss scale: 512.0 | grad norm: 11910.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 03:13:26,386] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=20, lr=[5.99997953854461e-05, 5.99997953854461e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 8000/ 159576 | consumed samples: 374640 | elapsed time per iteration (ms): 20647.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.117274E+00 | loss scale: 512.0 | grad norm: 6363.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 8000 loss: 6.9948 iter time (s): 0.010 samples/sec: 12373.041 ------------------------------------------------------------------------------------------------ validation loss at iteration 8000 | lm loss value: 6.930026E+00 | lm loss PPL: 1.022520E+03 | ------------------------------------------------------------------------------------------------ iteration 8010/ 159576 | consumed samples: 375920 | elapsed time per iteration (ms): 22250.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.907596E+00 | loss scale: 512.0 | grad norm: 5175.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8020/ 159576 | consumed samples: 377200 | elapsed time per iteration (ms): 20702.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.903972E+00 | loss scale: 512.0 | grad norm: 8915.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8030/ 159576 | consumed samples: 378544 | elapsed time per iteration (ms): 21181.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.942516E+00 | loss scale: 512.0 | grad norm: 8113.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8040/ 159576 | consumed samples: 379984 | elapsed time per iteration (ms): 21914.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.923864E+00 | loss scale: 512.0 | grad norm: 19249.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8050/ 159576 | consumed samples: 381424 | elapsed time per iteration (ms): 21865.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.876669E+00 | loss scale: 512.0 | grad norm: 7890.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 03:32:27] PULSE: tr8-104B is scheduled to start in 19:12:32 (at 2021-09-28T22:45:00) (1277218 on 'gpu_p13' partition) [2021-09-28 03:32:27] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition) [2021-09-28 03:32:27] PULSE: tr8-104B is running for 9:49:01 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 8060/ 159576 | consumed samples: 382864 | elapsed time per iteration (ms): 21779.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.788055E+00 | loss scale: 512.0 | grad norm: 9618.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8070/ 159576 | consumed samples: 384304 | elapsed time per iteration (ms): 21643.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.808229E+00 | loss scale: 512.0 | grad norm: 8857.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8080/ 159576 | consumed samples: 385744 | elapsed time per iteration (ms): 21639.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.901846E+00 | loss scale: 512.0 | grad norm: 8983.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8090/ 159576 | consumed samples: 387184 | elapsed time per iteration (ms): 22052.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.863363E+00 | loss scale: 512.0 | grad norm: 9399.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8100/ 159576 | consumed samples: 388624 | elapsed time per iteration (ms): 21866.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.843295E+00 | loss scale: 512.0 | grad norm: 8690.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8110/ 159576 | consumed samples: 390064 | elapsed time per iteration (ms): 21853.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.893594E+00 | loss scale: 512.0 | grad norm: 13780.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8120/ 159576 | consumed samples: 391504 | elapsed time per iteration (ms): 21812.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.924708E+00 | loss scale: 512.0 | grad norm: 7097.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8130/ 159576 | consumed samples: 392944 | elapsed time per iteration (ms): 21586.9 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.829758E+00 | loss scale: 512.0 | grad norm: 7266.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8140/ 159576 | consumed samples: 394384 | elapsed time per iteration (ms): 21935.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.820535E+00 | loss scale: 512.0 | grad norm: 7758.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8150/ 159576 | consumed samples: 395824 | elapsed time per iteration (ms): 21921.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.822125E+00 | loss scale: 512.0 | grad norm: 6965.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8160/ 159576 | consumed samples: 397264 | elapsed time per iteration (ms): 21703.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.756792E+00 | loss scale: 512.0 | grad norm: 9871.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8170/ 159576 | consumed samples: 398704 | elapsed time per iteration (ms): 21847.9 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.773450E+00 | loss scale: 512.0 | grad norm: 12746.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8180/ 159576 | consumed samples: 400144 | elapsed time per iteration (ms): 21833.8 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.785934E+00 | loss scale: 512.0 | grad norm: 5598.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8190/ 159576 | consumed samples: 401584 | elapsed time per iteration (ms): 21797.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.870234E+00 | loss scale: 512.0 | grad norm: 6782.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8200/ 159576 | consumed samples: 403024 | elapsed time per iteration (ms): 21810.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.838039E+00 | loss scale: 512.0 | grad norm: 9577.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8210/ 159576 | consumed samples: 404464 | elapsed time per iteration (ms): 21905.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.807652E+00 | loss scale: 512.0 | grad norm: 11918.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 04:33:02] PULSE: tr8-104B is scheduled to start in 18:11:57 (at 2021-09-28T22:45:00) (1277218 on 'gpu_p13' partition) [2021-09-28 04:33:02] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition) [2021-09-28 04:33:02] PULSE: tr8-104B is running for 10:49:36 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 8220/ 159576 | consumed samples: 405904 | elapsed time per iteration (ms): 21977.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.819595E+00 | loss scale: 512.0 | grad norm: 6882.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8230/ 159576 | consumed samples: 407344 | elapsed time per iteration (ms): 21630.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.880849E+00 | loss scale: 512.0 | grad norm: 17414.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8240/ 159576 | consumed samples: 408784 | elapsed time per iteration (ms): 21894.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.930541E+00 | loss scale: 512.0 | grad norm: 7836.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8250/ 159576 | consumed samples: 410224 | elapsed time per iteration (ms): 21731.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.906449E+00 | loss scale: 512.0 | grad norm: 7978.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8260/ 159576 | consumed samples: 411664 | elapsed time per iteration (ms): 21776.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.893109E+00 | loss scale: 512.0 | grad norm: 9114.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8270/ 159576 | consumed samples: 413104 | elapsed time per iteration (ms): 22166.2 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.885992E+00 | loss scale: 512.0 | grad norm: 13085.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8280/ 159576 | consumed samples: 414544 | elapsed time per iteration (ms): 21762.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.789729E+00 | loss scale: 512.0 | grad norm: 11443.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8290/ 159576 | consumed samples: 415984 | elapsed time per iteration (ms): 21743.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.784861E+00 | loss scale: 512.0 | grad norm: 10437.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8300/ 159576 | consumed samples: 417424 | elapsed time per iteration (ms): 21878.0 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.831153E+00 | loss scale: 512.0 | grad norm: 6842.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8310/ 159576 | consumed samples: 418864 | elapsed time per iteration (ms): 21680.7 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.847891E+00 | loss scale: 512.0 | grad norm: 8236.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8320/ 159576 | consumed samples: 420304 | elapsed time per iteration (ms): 21650.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.831273E+00 | loss scale: 512.0 | grad norm: 10757.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8330/ 159576 | consumed samples: 421744 | elapsed time per iteration (ms): 21761.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.866577E+00 | loss scale: 512.0 | grad norm: 9414.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8340/ 159576 | consumed samples: 423184 | elapsed time per iteration (ms): 22000.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.927114E+00 | loss scale: 512.0 | grad norm: 22264.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8350/ 159576 | consumed samples: 424624 | elapsed time per iteration (ms): 21732.0 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.098891E+00 | loss scale: 512.0 | grad norm: 10280.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8360/ 159576 | consumed samples: 426160 | elapsed time per iteration (ms): 22517.6 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.958164E+00 | loss scale: 1024.0 | grad norm: 13178.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8370/ 159576 | consumed samples: 427760 | elapsed time per iteration (ms): 23182.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.889060E+00 | loss scale: 1024.0 | grad norm: 18842.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8380/ 159576 | consumed samples: 429360 | elapsed time per iteration (ms): 23097.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.878168E+00 | loss scale: 1024.0 | grad norm: 18421.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 05:32:36] PULSE: tr8-104B is scheduled to start in 17:12:23 (at 2021-09-28T22:45:00) (1277218 on 'gpu_p13' partition) [2021-09-28 05:32:36] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition) [2021-09-28 05:32:36] PULSE: tr8-104B is running for 11:49:10 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 8390/ 159576 | consumed samples: 430960 | elapsed time per iteration (ms): 22911.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.836983E+00 | loss scale: 1024.0 | grad norm: 21055.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8400/ 159576 | consumed samples: 432560 | elapsed time per iteration (ms): 23311.7 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.867126E+00 | loss scale: 1024.0 | grad norm: 13309.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8410/ 159576 | consumed samples: 434160 | elapsed time per iteration (ms): 22945.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.896465E+00 | loss scale: 1024.0 | grad norm: 24249.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8420/ 159576 | consumed samples: 435760 | elapsed time per iteration (ms): 22797.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.923830E+00 | loss scale: 1024.0 | grad norm: 16621.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8430/ 159576 | consumed samples: 437360 | elapsed time per iteration (ms): 23019.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.940806E+00 | loss scale: 1024.0 | grad norm: 15050.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8440/ 159576 | consumed samples: 438960 | elapsed time per iteration (ms): 23026.2 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.984757E+00 | loss scale: 1024.0 | grad norm: 22968.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8450/ 159576 | consumed samples: 440560 | elapsed time per iteration (ms): 22903.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.970832E+00 | loss scale: 1024.0 | grad norm: 25206.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8460/ 159576 | consumed samples: 442160 | elapsed time per iteration (ms): 22992.7 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.992513E+00 | loss scale: 1024.0 | grad norm: 9219.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8470/ 159576 | consumed samples: 443760 | elapsed time per iteration (ms): 23036.6 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.053975E+00 | loss scale: 1024.0 | grad norm: 9743.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8480/ 159576 | consumed samples: 445360 | elapsed time per iteration (ms): 22710.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.087634E+00 | loss scale: 1024.0 | grad norm: 36403.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8490/ 159576 | consumed samples: 446960 | elapsed time per iteration (ms): 22994.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.142048E+00 | loss scale: 1024.0 | grad norm: 8807.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8500/ 159576 | consumed samples: 448560 | elapsed time per iteration (ms): 22707.3 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.160313E+00 | loss scale: 1024.0 | grad norm: 9148.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8510/ 159576 | consumed samples: 450160 | elapsed time per iteration (ms): 22963.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.277474E+00 | loss scale: 1024.0 | grad norm: 43448.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8520/ 159576 | consumed samples: 451760 | elapsed time per iteration (ms): 19193.8 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 64.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8530/ 159576 | consumed samples: 453360 | elapsed time per iteration (ms): 15554.5 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8540/ 159576 | consumed samples: 454960 | elapsed time per iteration (ms): 15434.8 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8550/ 159576 | consumed samples: 456560 | elapsed time per iteration (ms): 15729.0 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 06:32:50] PULSE: tr8-104B is scheduled to start in 17:29:26 (at 2021-09-29T00:02:17) (1277218 on 'gpu_p13' partition) [2021-09-28 06:32:50] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition) [2021-09-28 06:32:50] PULSE: tr8-104B is running for 12:49:24 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 8560/ 159576 | consumed samples: 458160 | elapsed time per iteration (ms): 15526.6 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8570/ 159576 | consumed samples: 459760 | elapsed time per iteration (ms): 15343.9 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8580/ 159576 | consumed samples: 461360 | elapsed time per iteration (ms): 15516.0 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8590/ 159576 | consumed samples: 462960 | elapsed time per iteration (ms): 15788.5 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8600/ 159576 | consumed samples: 464560 | elapsed time per iteration (ms): 15421.5 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8610/ 159576 | consumed samples: 466160 | elapsed time per iteration (ms): 15365.4 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8620/ 159576 | consumed samples: 467760 | elapsed time per iteration (ms): 15460.6 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8630/ 159576 | consumed samples: 469360 | elapsed time per iteration (ms): 15794.2 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8640/ 159576 | consumed samples: 470960 | elapsed time per iteration (ms): 15928.5 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8650/ 159576 | consumed samples: 472560 | elapsed time per iteration (ms): 15514.8 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8660/ 159576 | consumed samples: 474320 | elapsed time per iteration (ms): 16639.1 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8670/ 159576 | consumed samples: 476080 | elapsed time per iteration (ms): 16569.6 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8680/ 159576 | consumed samples: 477840 | elapsed time per iteration (ms): 16695.6 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8690/ 159576 | consumed samples: 479600 | elapsed time per iteration (ms): 16700.3 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8700/ 159576 | consumed samples: 481360 | elapsed time per iteration (ms): 16569.3 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8710/ 159576 | consumed samples: 483120 | elapsed time per iteration (ms): 16526.6 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8720/ 159576 | consumed samples: 484880 | elapsed time per iteration (ms): 16370.8 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8730/ 159576 | consumed samples: 486640 | elapsed time per iteration (ms): 16678.1 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8740/ 159576 | consumed samples: 488400 | elapsed time per iteration (ms): 16715.4 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8750/ 159576 | consumed samples: 490160 | elapsed time per iteration (ms): 16605.2 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8760/ 159576 | consumed samples: 491920 | elapsed time per iteration (ms): 16522.8 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8770/ 159576 | consumed samples: 493680 | elapsed time per iteration (ms): 16607.3 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 07:32:48] PULSE: tr8-104B is scheduled to start in 17:38:05 (at 2021-09-29T01:10:54) (1277218 on 'gpu_p13' partition) [2021-09-28 07:32:48] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition) [2021-09-28 07:32:48] PULSE: tr8-104B is running for 13:49:22 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 8780/ 159576 | consumed samples: 495440 | elapsed time per iteration (ms): 16798.5 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8790/ 159576 | consumed samples: 497200 | elapsed time per iteration (ms): 16594.8 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8800/ 159576 | consumed samples: 498960 | elapsed time per iteration (ms): 16863.3 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) srun: Job step aborted: Waiting up to 62 seconds for job step to finish. Killing subprocess 30115 Killing subprocess 30116 Killing subprocess 72376 Killing subprocess 30117 Killing subprocess 72377 Killing subprocess 72378 Killing subprocess 30118 Main process received SIGTERM, exiting Killing subprocess 72380 Killing subprocess 14784 Killing subprocess 14785 Killing subprocess 13422 Killing subprocess 14786 Killing subprocess 55737 Killing subprocess 14788 Killing subprocess 70412 Main process received SIGTERM, exiting Killing subprocess 16940 Killing subprocess 72459 Killing subprocess 13423 Killing subprocess 74871 Killing subprocess 55738 Killing subprocess 29874 Killing subprocess 66501 Killing subprocess 16941 Killing subprocess 16942 Killing subprocess 16943 Killing subprocess 16970 Killing subprocess 70413 Killing subprocess 72867 Killing subprocess 13424 Killing subprocess 29875 Killing subprocess 13425 Main process received SIGTERM, exiting Killing subprocess 74872 Killing subprocess 13332 Killing subprocess 38577 Killing subprocess 60665 Killing subprocess 59238 Killing subprocess 59239 Killing subprocess 55739 Killing subprocess 71579 Killing subprocess 55740 Killing subprocess 13333 Killing subprocess 70414 Killing subprocess 72868 Killing subprocess 70416 Killing subprocess 33635 Killing subprocess 74873 Killing subprocess 16971 Killing subprocess 59240 Killing subprocess 29876 Killing subprocess 72869 Killing subprocess 4131 Killing subprocess 31723 Killing subprocess 29877 Killing subprocess 70249 Main process received SIGTERM, exiting Killing subprocess 71580 Killing subprocess 33197 Killing subprocess 33198 Killing subprocess 33199 Killing subprocess 16972 Killing subprocess 13334 Killing subprocess 37375 Killing subprocess 31519 Killing subprocess 60666 Killing subprocess 60928 Killing subprocess 5189 Killing subprocess 71748 Killing subprocess 60667 Killing subprocess 59241 Main process received SIGTERM, exiting Killing subprocess 52958 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 71581 Main process received SIGTERM, exiting Killing subprocess 76865 Killing subprocess 72870 Killing subprocess 4132 Killing subprocess 60668 Killing subprocess 31520 Main process received SIGTERM, exiting Killing subprocess 38578 Killing subprocess 74874 Killing subprocess 16973 Killing subprocess 76175 Main process received SIGTERM, exiting Killing subprocess 37376 Killing subprocess 60929 Main process received SIGTERM, exiting Killing subprocess 72460 Killing subprocess 52959 Killing subprocess 66400 Killing subprocess 33636 Killing subprocess 5190 Killing subprocess 76176 Killing subprocess 73489 Killing subprocess 72461 Killing subprocess 13335 Killing subprocess 38579 Killing subprocess 76866 Main process received SIGTERM, exiting Killing subprocess 6862 Killing subprocess 52960 Killing subprocess 38580 Killing subprocess 76177 Killing subprocess 31521 Killing subprocess 60930 Main process received SIGTERM, exiting Killing subprocess 33637 slurmstepd: error: *** STEP 1271196.0 ON r7i7n6 CANCELLED AT 2021-09-28T07:42:47 *** Killing subprocess 14888 Killing subprocess 71582 Killing subprocess 31522 Killing subprocess 72462 Killing subprocess 70250 Killing subprocess 33639 Killing subprocess 5191 Killing subprocess 76178 Killing subprocess 76867 Killing subprocess 73490 Killing subprocess 8322 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 5192 Killing subprocess 71749 Killing subprocess 66401 Killing subprocess 70251 Killing subprocess 31724 Killing subprocess 23140 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 76869 Killing subprocess 24195 Killing subprocess 3669 Killing subprocess 14889 Killing subprocess 6863 Killing subprocess 73491 Killing subprocess 4133 Killing subprocess 70253 Killing subprocess 31725 Killing subprocess 14890 Killing subprocess 52961 Killing subprocess 66402 Killing subprocess 57345 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 66403 Killing subprocess 79017 Killing subprocess 5022 Killing subprocess 26301 Killing subprocess 71750 Main process received SIGTERM, exiting Killing subprocess 23141 Killing subprocess 66502 Killing subprocess 2542 Killing subprocess 37377 Killing subprocess 32138 Killing subprocess 62368 Killing subprocess 4134 Killing subprocess 33200 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 79018 Killing subprocess 62803 Killing subprocess 62804 Killing subprocess 62805 Killing subprocess 42235 Killing subprocess 1224 Killing subprocess 31687 Killing subprocess 65257 Main process received SIGTERM, exiting Killing subprocess 54282 Killing subprocess 2543 Killing subprocess 79019 Killing subprocess 42236 Killing subprocess 42237 Killing subprocess 36949 Killing subprocess 62369 Killing subprocess 23142 Killing subprocess 66503 Killing subprocess 3670 Main process received SIGTERM, exiting Killing subprocess 2544 Killing subprocess 7298 Killing subprocess 37378 Killing subprocess 73492 Killing subprocess 42238 Killing subprocess 31688 Killing subprocess 31689 Killing subprocess 31690 Killing subprocess 66505 Killing subprocess 2546 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 26302 Killing subprocess 39557 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 78372 Killing subprocess 27460 Killing subprocess 62806 Killing subprocess 8323 Killing subprocess 24196 Killing subprocess 1225 Killing subprocess 23143 Killing subprocess 3671 Killing subprocess 54283 Killing subprocess 14892 Killing subprocess 7299 Killing subprocess 71751 Killing subprocess 5023 Killing subprocess 78860 Main process received SIGTERM, exiting Killing subprocess 24197 Killing subprocess 57346 Main process received SIGTERM, exiting Killing subprocess 7300 Main process received SIGTERM, exiting Killing subprocess 78861 Killing subprocess 32139 Main process received SIGTERM, exiting Killing subprocess 36950 Killing subprocess 1226 Killing subprocess 26303 Main process received SIGTERM, exiting Killing subprocess 54284 Killing subprocess 5024 Killing subprocess 57347 Killing subprocess 26304 Killing subprocess 57348 Main process received SIGTERM, exiting Killing subprocess 78373 Killing subprocess 27461 Killing subprocess 8324 Killing subprocess 24198 Killing subprocess 3672 Killing subprocess 78374 Killing subprocess 54286 Killing subprocess 78862 Killing subprocess 32140 Killing subprocess 8325 Main process received SIGTERM, exiting Killing subprocess 36951 Killing subprocess 1227 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 78375 Killing subprocess 32141 Main process received SIGTERM, exiting Killing subprocess 36952 Killing subprocess 7301 Killing subprocess 78863 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 31726 Main process received SIGTERM, exiting Killing subprocess 7871 Killing subprocess 62370 Killing subprocess 60931 Main process received SIGTERM, exiting Killing subprocess 79020 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 7872 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 65258 Main process received SIGTERM, exiting Killing subprocess 22589 Killing subprocess 62372 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 5025 Main process received SIGTERM, exiting Killing subprocess 33581 Killing subprocess 7873 Main process received SIGTERM, exiting Killing subprocess 66867 Main process received SIGTERM, exiting Killing subprocess 7875 Killing subprocess 65259 Killing subprocess 65260 Main process received SIGTERM, exiting Killing subprocess 22590 Killing subprocess 22591 Killing subprocess 66868 Killing subprocess 22592 Main process received SIGTERM, exiting Killing subprocess 33582 Killing subprocess 66869 Killing subprocess 33583 Killing subprocess 6864 Killing subprocess 27462 Main process received SIGTERM, exiting Killing subprocess 23047 Killing subprocess 6865 Killing subprocess 27463 Killing subprocess 66871 Main process received SIGTERM, exiting Killing subprocess 43155 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 33585 Main process received SIGTERM, exiting Killing subprocess 43156 Killing subprocess 43157 Killing subprocess 39558 Killing subprocess 23048 Killing subprocess 23049 Killing subprocess 23050 Killing subprocess 43159 Main process received SIGTERM, exiting Killing subprocess 39559 Main process received SIGTERM, exiting Killing subprocess 39560 Main process received SIGTERM, exiting [2021-09-28 08:32:52] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 09:33:05] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 10:33:03] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 11:33:17] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 177, in main() File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 172, in main send_email_alert_job_not_scheduled(args.job_name) File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 61, in send_email_alert_job_not_scheduled send_email(subject, body) File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 39, in send_email server = smtplib.SMTP("localhost") File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 251, in __init__ (code, msg) = self.connect(host, port) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 336, in connect self.sock = self._get_socket(host, port, self.timeout) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 307, in _get_socket self.source_address) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/socket.py", line 728, in create_connection raise err File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/socket.py", line 716, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused [2021-09-28 12:33:29] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 13:33:44] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 14:34:11] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 15:33:54] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 16:34:11] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 177, in main() File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 172, in main send_email_alert_job_not_scheduled(args.job_name) File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 61, in send_email_alert_job_not_scheduled send_email(subject, body) File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 39, in send_email server = smtplib.SMTP("localhost") File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 251, in __init__ (code, msg) = self.connect(host, port) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 336, in connect self.sock = self._get_socket(host, port, self.timeout) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 307, in _get_socket self.source_address) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/socket.py", line 728, in create_connection raise err File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/socket.py", line 716, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused